IT4OD. October 19-20, Department of Mathematics and Computer Science University of Tebessa. Proceedings

Size: px
Start display at page:

Download "IT4OD. October 19-20, 2014. Department of Mathematics and Computer Science University of Tebessa. Proceedings"


1 IT4OD October 19-20, 2014 Department of Mathematics and Computer Science University of Tebessa Proceedings of International Conference on Information Technology for Organization Development Edited by General Chair Conference: - Hakim BENDJENNA And Program Chairs: - Makhlouf DERDOUR - Louardi BRADJI - Mohamed Ridda LAOUAR IT4OD 2014

2 International conference on Information Technology for Organization Development IT4OD 2014

3 IT4OD October 19-20, 2014 Department of Mathematics and Computer Science University of Tebessa Proceedings of International Conference on Information Technology for Organization Development Edited by General Chair Conference: - Hakim BENDJENNA And Program Chairs: - Makhlouf DERDOUR - Louardi BRADJI - Mohamed Ridda LAOUAR IT4OD 2014

4 Table of Contents Preface 1 Keynotes Speakers 5 Contribution Papers 7 Topic 1: Software Engineering 8 Lamia GAOUAR, Abdelkrim BENAMAR and Fethi Tarik BENDIMERAD. Requirements of cross platform mobile 9 development tools Khelifa BENAHMED and Amel DOULI. Design of a New Smart-Irrigation System in the South of Algeria 16 Ahmed MOKHTARI and Salima HASSAS. The evolution of collaboratif tagging systems and the relation with 21 the emergence of language: Agent based modeling Tarek ABID and Hafed ZARZOUR. Integrating Linked Open Data in Geographical Information System 26 Messaoud ABBAS. Using FoCaLiZe to Check OCL constraints on UML classes 31 Khelifa BENAHMED and Amel DOULI. Design of smart grid system in south Algeria 39 Topic 2: Security and Access Management 46 Tahar MEKHAZNIA and Abdelmadjid ZIDANE. Swarm intelligence algorithms in cryptanalysis of Simple Feistel 47 Ciphers Mustapha HEMIS and Bachir BOUDRAA. Secure audio watermarking algorithm: An application for copyright 53 protection Meriem ZAITER, Salima HACINI and Boufaida ZIZETTE. La tolérance aux fautes à base d agents pour les 59 systèmes domotiques Othaila CHERGUI, Abdallah MERAOUMIA and Hakim BENDJENNA. Robust Multimodal Personal Identification 65 System Using Palmprint & Palm-vein Images Salah Ahmed OTHMAN and Tarik BOUDGHENE STAMBOULI. Core Points Detection for Touch Based and 71 Touch-less Fingerprint Images Ahmed AHMIM and Nacira GHOUALMI-ZINE. A hierarchical intrusion detection system based on combining 76 predictions of different classifiers Topic 3: Pervasive Systems and Multi-criteria Decision 82 Imene BENATIA, Mohamed Ridda LAOUAR and Hakim BENDJENNA. Cloud based Decision Support System for 83 urban planning Lakhdar LAIMECHE, MEROUANI Farida Hayet and Mazouzi SMAINE. New feature vector for steganalysis 89 Lydia YATAGHENE, AIT-IDIR Kahina, DEBA Abbassia and EDDOUD Abdelkader. A Cloud Computing model for 95 Algerian universities Salsabil GOUCEM, Saddek BENABIED and Hakim BENDJENNA. AFuzzyChoq: A Multi Criteria Decision Method 101 for Classification Problem Topic 4: Networks and Embedded Systems 108 Aicha AGGOUNE, Abdelkrim BOURAMOUL and Mohamed Khiereddine KHOLLADI. Personnalisation d accès 109 aux sources de données hétérogènes pour l organisation des grands systèmes d information d entreprise Noureddine HAOUARI and Samira MOUSSAOUI. Segment-based Local Density Estimation for VANETs 116 Soltane MERZOUG, Makhlouf DERDOUR and Mohamed GHARZOULI. CCBRP: A protocol based QoS for 121 mobile multimedia sensor networks Nour El Houda BAHLOUL, Mohamed Rida ABDESSEMED and Abdelmajid ZIDANI. Bio-inspired Routing 128 Protocol, Oriented Maintaining of Connectivity by Mobility Control Fateh BOUTEKKOUK and Soumia OUBADI. Periodic/Aperiodic Tasks Scheduling Optimization for Real Time 135 Embedded Systems with hard/soft constraints IT4OD 2014

5 Topic 5: WEB Technology and Knowledge 141 Mohamed GHARZOULI and Djamel BENMERZOUG. Developing a Knowledge Distributed Architecture to 142 Discover and Compose Semantic Web Services Houda EL BOUHISSI and Mimoun MALKI. Building Web Service Ontology: A Reverse Engineering Approach 150 Adel ALTI, Mourad SAADAOUI and Makhlouf DERDOUR. Semantic Social-Based Adaptation Approach for 156 Multimedia Documents in P2P Architecture Mohamed BELAGGOUNE, Saliha OUKID and Nadjia BENBLIDIA. Ontology driven graph matching approach for 162 automatic labeling brain cortical SULCI Razika DRIOUCHE, Nawel KEMCHA and Houda BENSASSI. Processus de construction d'ontologie basé sur 168 l'extraction automatique des connaissances à partir des textes structurés Mohamed GASMI and Mustapha BOURAHLA. Le raisonnement sur la logique de description floue et 174 décomposée Takashi MATSUHISA. Common-Knowledge, Communication and Cooperation Management : Epistemic 181 Approach to Cooperative Management Topic 6: Artificial Intelligence and Multimedia 188 Nadia ZIKIOU, Mourad LAHDIR and Soltane AMEUR. Application des Curvelets et Régression SVM pour la 189 Compression d Images Borhen Eddine DAKKAR and Fella HACHOUF. No-reference Image Quality Assessment for JPEG Compressed Images Using Natural Scene Statistics And Spatial Pooling Assia BELOUCIF and Lemnouar NOUI. A symmetric image encryption scheme based on diagonal matrices and 200 the XOR operation Nour El-Houda BENALIA, Nesrine OUANNES and Noureddine DJEDI. An Improved CUDA based Hybrid 205 Metaheuristic for Fast Controller of an Evolutionary Robot Amir BENZAOUI. Face Recognition Using Local Binary Patterns in One Dimensional Space and Wavelets 211 Ahlem MELOUAH and Aicha TAMEUR. Seeded selection for 2D medical image segmentation based on region 218 growing algorithm: a survey Topic 7: Poster 222 Djoudi KERFA and Mohamed F. BELBACHIR. A New Moving Object Detection Method for Visual Surveillance 223 Karima DJEBAILI and Lamine MELKEMI. Cryptanalysis of discrete logarithms based cryptosytem using 229 continued fraction and the legendres result Mohammed SAIGAA, Abdallah MERAOUMIA, Salim CHITROUB and Ahmed BOURIDANE. Fusion of Finger- 234 Knuckle-Print & Finger-Vein-Print for Personal Identification Using PCA Fatiha OUAZAR, Malika IOUALALEN and Mohand Cherif BOUKALA. Verification of modular systems 240 Karima BENHAMZA. Intelligent traffic light control using an adaptive approach 246 Manel GHERARI, Abdelkrim AMIRAT and Mourad OUSSALAH. Describing Ubiquitous Mobile Cloud Systems 251 Khelifa BENAHMED, Abla SMAHI and Souhila HIRECHE. Smart grid in Algeria: ambition towards a great 259 expectation Ahcene Youcef BENABDALLAH and Rachid BOUDOUR. Une nouvelle approche de développement des IPS 265 pour les SOCs à base du Systemc IT4OD 2014

6 Preface

7 Preface It is with great honor and pleasure that we welcome you to the International Conference on Information Technology for Organization Development (IT4OD 2014). This conference is sponsored by IEEE subsection, Algeria. The IT4OD 2014 was held in University of Tebessa, Algeria, on October The IT4OD 2014 Conference continue to be the premier meeting attracting an international participation of Information Technology for Organization. This volume of the proceedings contains the abstracts of the tow invited talks, and papers selected for presentation and poster at the conference. The papers were categorized into seven six topics and poster session: Software engineering, Security and Access management, Pervasive Systems and Multi-criteria Decision, Networks and Embedded Systems, WEB Technology, IA and Multimedia. We would like to take this opportunity to thank all the authors whose papers were accepted, as well as those whose papers could not be included, for their submissions to IT4OD We have the honor of welcoming our very notable keynotes speakers in IT4OD We are very thankful to Professor Takashi MATSUHISA from Russian Academy of Science for finding the time to present a keynote on his very extensive work on Common-Knowledge, Communication and Cooperation Management Research, a topic which is especially crucial to our conference themes. Also, many thank to our distinguished Pr. Hafid HAFFAF, Oran Es-Senia, IGMO, Algeria for accepting our invitation and for honoring the conference with his presence and his inspired talk on Vehicular Network Infrastructure. We are indeed very grateful for the support of a large number of individuals and organizations without which this conference would not have been possible: - The Program Committee (PC) consists of experts in the various source subfields. We would like to thank PC members for their huge efforts within the too short time. - The Local Organizing Committee managed everything following the policy of IT4OD We would like to thank all members of Local Organizing Committee as well as the researchers who supported the conference in a multitude of ways. We would like to express our deep appreciation and very grateful to a conference sponsor for their support: - University of Tebessa, - IEEE Algeria subsection, for the support, - LAMIS Laboratory. We hope that this Supplementary Proceedings, both in hardcopy and on the CD Rom, will provide a valuable resource for Information Technology researchers and Organization Development, both those who have travelled to University of Tebessa, Algeria, to participate in this Conference, as well as to the others, who are less fortunate and will miss the inspiring presentations and discussions but will, at least, have this record of the conference activities. We wish the delegates a most enjoyable and memorable conference. IT4OD 2014 Chair Hakim BENDJENNA IT4OD 2014 Program Chairs Makhlouf DERDOUR, Louardi BRADJI Mohamed Ridda LAOUAR 1 IT4OD 2014

8 International conference on Information Technology for Organization Development IT4OD 2014 Conference Committees Conference Chair Honorary Conference Chair - Hakim BENDJENNA - Pr. Said FEKRA, Rector of Tebessa University Program Chairs - Makhlouf DERDOUR, - Louardi BRADJI - Mohamed Ridda LAOUAR Technical Program Committee Abdel ENNAJI University of Rouen France Abdelkrim AMIRAT University of Souk Ahras Algeria Abdelkrim BOURAMOUL University of Constantine 2 Algeria Abdelkader BELKHIR USTHB Algeria Abdelouahed GHARBI School of Superior Technologies Canada Adil ANWAR University of Rabat Morocco Ahmed BOURIDANE University of Newcastle UK Aurora VIZCAINO University of Castilla-La Mancha Spain Axel Van LAMSWEERDE University of Louvain Belgium Boualem BENATALLAH University of new south Wales Australia Chawki DJEDDI University of Tebessa Algeria Congduc PHAM UPPA France Djamel BENMERZOUG University of Constantine 2 Algeria Djamel SAMAI University of Kasdi Merbah Ouargla Algeria Faiez GARGOURI University of Sfax Tunisia Farid MOKHATI University of Oum el Bouaghi Algeria Farida SEMMAK University of Paris-est France Fattah ZIRARI University of Ibn Zohr Morocco Filippo LANUBILE University of Bari Italy Haikal EL ABED Technical Trainers College, Saudi Arabia Hayet MEROUANI Badji Mokhtar University Algeria Ivan JURETA University of Namur Belgium Imran SIDDIQI University of Bahria Pakistan Javier Gonzalez HUERTA University of Polytechnic-Valencia Spain John MYLOPOULOS University of Trento Italy Karim ZAROUR University of Constantine 2 Algeria Klaus POHL University of Essen Germany Lawrence CHUNG University of Texas USA Lin LIU University of Tsinghua China Lotfi BOUZGUENDA University of Sfax Tunisia Luca BREVEGLIERI Politecnico di Milano Italy Malika LOUALALEN USTHB Algeria Marc DALMAU University of Pau France Marco KUHRMANN University of München Germany MariaGrazia FUGINI Politecnico di Milano Italy Mehmet AKSIT University of Twente Netherland Abdallah MERAOUMIA University of Ouargla Algeria Michael WHALEN University of Minnesota USA Mohamed Gherzouli University of Constantine 2 Algeria Mohamed Ridda LAOUAR University of Tebessa Algeria Mohamed K. KHOLLADI University of El oued Algeria Mostafa EZZIYYANI University of Tangier Morocco Mohammad SADOGHI University of Toronto Canada Nacer eddine ZAROUR University of Constantine 2 Algeria Nadjia BENBLIDIA Saad Dahlab University Algeria Nacira GHOUALMI University of Annaba Algeria 2 IT4OD 2014

9 International conference on Information Technology for Organization Development IT4OD 2014 Okba KEZAR University of Biskra Algeria Panos VASSILIADIS University of Ioannina Greece Philippe ANIORTE University of Bayonne France Philippe ROOSE University of Pau France Pierre-Jean CHARREL University of Toulouse 2 France Rahma BOUAZIZ University of Gabes Tunisia Raouf BOUTABA University of Waterloo Canada Sahnoun ZAIDI University of Constantine 2 Algeria Saliha AOUAT USTHB Algeria Salim CHITROUB USTHB Algeria Samira SI-SAID CHERFI CRCST France Slim KAMMOUN University of Tunis Tunisia Takashi MATSUHISA INCT Japon Terrill FRANTZ Peking University China Xavier FRANCH Universitat Politécnica de Catalunya Spain Younes LAKHRISSI University of Fes Morocco Zizette BOUFAIDA University of Constantine 2 Algeria - Abdeljalil GATTAL - Abdelmalek METROUH - Afrah DJEDDAR - Akrem BENNOUR - Chadia NASRI - Chawki DJEDDI - Fathi HEMIDANE - Hakim GHARBI - Imen BENATIA - Issam BENDHIB - Kamel HAOUAM - Manel GHERARI - Mohamed AMROUNe - Med Saleh SOUAHI - Med Yacine HAOUAM - Salima BOUROUGAA - Samir TAG - Taher MEKHAZNIA Organization Committee 3 IT4OD 2014

10 Keynotes Speakers 4 IT4OD 2014

11 Keynotes Speakers Takashi MATSUHISA Institute of Applied Mathematical Research. Karelia Research Centre, Russian Academy of Science. Petrozavodsk, Karelia, Russia Title: Common-Knowledge, Communication and Cooperation Management Epistemic Approach to Cooperative Management Abstract: Issues of moral hazard and adverse selection abound in each and every contract where one has a self interest and information that the other party does not possess. While this is a fertile research area, there is still need for more information on how you handle a party to a contract with more information than you. The moral hazard is very often the bottleneck, and the buyer-supplier cooperation is an epitome. This paper re-examines the issue in the framework of a principal-agent model under uncertainty. It highlights epistemic conditions for a possible resolution of the moral hazard between the buyer and the suppliers. We show that the moral hazard disappeared in the principal-agent model under uncertainty if the buyer and suppliers commonly know each agent s belief on the others efforts, or if they communicate their beliefs on the others efforts to each other through messages. Biography: Pr. Takashi Matsuhisa is Lecturer of Mathematics at Department of Science, Ibaraki National College of Technology. He studied Mathematics in University of Tsukuba, and holds DrSc in Mathematics from Ritsumeikan University, Kyoto, Japan. His research interests lie at the interface of mathematical game theory, multi-modal logics, mathematical economics and management. He is a guest editor of the special issue, Game Theory and Application, of Journal of Applied Mathematics. Hafid HAFFAF Department of Computer Science, Oran Es-Senia, IGMO, Algeria Title: Vehicular Network Infrastructure Abstract: Vehicular networks will settle more only improve road safety, but they will also offer new services to road users making the journey enjoyable. In this paper, a survey of different required technologies to settle vehicle communication infrastructure is presented. After defining the role and classification of intelligent transport systems, we shall see different kind of protocols and mobility models. From Vanet, to ad hoc mobile networks, what is needed to give a communication platform which encompass vehicle to vehicle and vehicle to infrastructure communication? We insist in DSRC protocol as a model of road communication and give a short sea port automated vehicular infrastructure. The well-known simulation tools are finally presented. Biography: Pr. Hafid HAFFAF obtained Doctor Degree in computer Science in 2000; is a senior lecturer at the University of Oran Es-Senia (Algeria). He actually heads the R.I.I.R Laboratory at Computer science department Oran University and Delegate of Research Agency (ANVREDET), Algeria. 5 IT4OD 2014

12 His researchers concern different domain as Automatic control and diagnosis, optimisation reconfiguration using matroid theory, system of system approaches and their applications in Bond graph and monitoring. He has many collaborations projects with European Laboratory: Polytech Lille where he worked in Intelligent transport systems infrastructures and LIAU Pau (France) in the domain of Wireless sensor Networks (CMEP project). 6 IT4OD 2014

13 Contributions Papers 7 IT4OD 2014

14 Topic 1: Software engineering 8 IT4OD 2014

15 Requirements of cross platform mobile development tools Lamia GAOUAR Department of Computer Science University Abou Bekr Belkaid Tlemcen, Algeria Abdelkrim BENAMAR Department of Computer Science University Abou Bekr Belkaid Tlemcen, Algeria Fethi Tarik BENDIMERAD Department of Telecommunication University Abou Bekr Belkaid Tlemcen, Algeria Abstract While Smartphone are gaining popularity in recent year, we observed the emergence of several operating systems and device types, all evolving at different rhythms. Each platform became with its own SDK tools and requires specific capabilities. It is a challenge for developers to provide applications that meets customer expectations in this competitive market. The real challenge is when the application is targeted for multiple platforms. Developing native app for each platform, separately, need extreme effort. Cross-platform mobile development approaches emerged to address this problem. Several tools implementing these approaches have appeared with more or less success. Cross platform solutions can be recommended in general, but they differ in term of performance, usability and development environment provided. In this paper, we provide an overview of cross platform approaches. Afterward, we present several decision criteria regarding to the desirable requirements in cross platform tools, following by an analysis about some cross platform tools based on these requirements. Keywords Cross-platform tools; mobile application; cross platform mobile development; requirements I. INTRODUCTION For several years, we observe a strong growth in popularity of mobile devices as Smartphone and tablets [1]. This is due in large part to the applications developed for them and whose download rate is g rowing exponentially. But this growth is accompanied by the fragm entation of mobile platform, which seriously complicate the mobile applications development. Indeed, the mobile m arketplace is divided between different mobile platforms and very quickly evolving: there are at least ten different mobile platforms but just four have a rel evant number of users (Android, ios, Windows Phone, Blackberry OS) [1], knowing that some are called to disappear and others to appear. Further more, each platform offers its own SDK tools, need specific p rogramming skills (in terms of both languages and API) and using its own application market (see TABLE I) [2]. Companies then are being forced to develop and provide their solution as many platforms as possible in order to target the most users. Providing a native specific solution is a main challenge for th e companies due to the cost, resources and time associated with th e development activity for e ach platform. About 75-80% of the cost to develop for the first platform needs to be budgeted to build the second platform [3]. TABLE I. MAJOR MOBILE PLATFORMS Programming Development Application OS Language Environment Store Google s Eclipse, SDK Java, XML Play Store Android Android Apple s ios Objective-C XCode AppStore Microsoft Windows Phone RIM BlackBerry OS Visual C#, C++ Java Visual Studio MDS Studio, Plugin for Eclipse MarketPlace BlackBerry App World Cross-platform mobile development approaches emerged to address all or part of this challenge by allowing developers to implement their applications in one st ep for a ra nge of platforms. Efforts and development repetitions for m ultiple platforms can be reduced and pr oductivity increased by cross platform application development. Based on t hese approaches, more than 100 cross pl atform tools have em erged to help developers to create cross pl atform mobile application. Cross platform solutions can be recom mended in general, but they differ in term of performance, usability and development environment provided. Our paper present an over view of existing approaches of cros s-platform mobile development divided into five different categories. Afterward, we identify the desirable requirements of t echnology implementation pertaining to development environment, API and documentation and ot her criteria. We have also selected five cross platform tools (one per appr oach) that we wil l study based on such desirable requirements. This paper is organized as follows: the next section gives a description of existing approaches of cr oss platform mobile development. In sect ion 3, we describe some related works. Then, we introduce the desirable requirements of cross platform tools in section 4. In the light of these requirements, we discuss about five selected frameworks (one per approach) in section 5. We conclude this paper in section 6. II. OVERVIEW OF APPROACHES There are two ways to develop mobile applications for mobile devices: the native and the cross-platform approach [2]. The native approach which permits to create nat ive 9 IT4OD 2014

16 applications consists of dev eloping the same application as many as there s platform usi ng, for each platform, its own Software Development Kit (SDK) an d frameworks. For example, applications for Android are programmed in Java and XML, the platform functionality is accessed using the framework provided by Android a nd the user interface is rendered using platform provided elements. In di stinction, applications for i OS are developed using the language programming Objective-C and Apple s frameworks. In contrast to native approach, we present in this section an overview of the cross-platform approaches. Cross platform approach proposes to create single application which can be used across multiple platforms. They are designed to save time and costs by allowing developers to write an application once in a l anguage they know and using a framework which is adaptable for multiple platforms. For t hat purpose, different ways can be borrowed [4][5]: A. Web Approach Web approach consists to produce web applications that are designed to be executed in the web browser of mobile devices. Mobile web applications are devel oped using the standardization of we b technologies as HTM L, CSS and JavaScript. In this approach, the mobile device will not have any application specific components installed. The client hosts the application user interface a nd user data validation logic, while the server implements the business logic. The application is supported by the device browser, it is then platform independent. The main advantage of we b applications is that they are based on we b technologies and then can be accessed in a similar way t hrough mobile web browsers on al l platforms. Since data and application are host ed on t he server, no application update in the mobile device is required. The application is also easily monitored and maintained. An important limitation of th is approach is that th e applications cannot access the mobile device hardware and software such as the address book, calendar, GPS sensors, etc. Web applications can also suffer for a lake of performance due to connection and network delays. Nowadays, users seek application stores for the desired application. Considering the web applications are accessible using an URL and can not be distributed through mobile application stores, this might have a negative impact on its popularity. Web applications usually do not look and feel like a native application which can cause prejudice to the user experience. Several mature tools are dedicated to mobile web applications development, we can find also: jquery Mobile [6], jqtouch [7], Dojo Mobile [8], AppsBuilder [9], ibuildapp [10] and Se ncha Touch [11] which is the tool that we have selected for our study. B. Hybrid Approach To resolve the lack of native functionality but to still permit to employ common web t echnologies, hybrid approach has emerged as a co mbination of we b technologies and native functionalities. Hybrid approach uses the browser engine of the device which renders and displays the HTML content in native container on the mobile device that is a full screen Web view control. The device capabilities are ex posed to the hybrid application through an abstraction layer. The abstraction layer exposes the device capabilities as JavaScript APIs. Unlike web applications, hybrid applications are downloaded and installed on the mobile device. In contrast to web a pplications, a hybrid application is distributable through application stores. Since native platform features are made available through the hardware abstraction layer, application can make use of device features. However, hybrid applications are inferior in performance compared to the native applications since the execution happens in the browser engine. The user interface also may lack of the look and feel of native application. To achieve the native look and feel, the platform specific styling might be required. The most popular exponent of this approach is PhoneGap [12] that we h ave chosen for our st udy. Among the existing tools based o n this approach we can al so cite frameworks as MoSync [13].There are some tools as m gwt [14] or vaadin TouchKit [15] that more or less try to mimic native user interface by us ing web technologies. However these previous approaches suffer from running in web environment which remains their fundamental limitation. C. Interpreted Approach In the case of interpreted app lications, the application code is deployed in the mobile device and, at runtime, an interpreter executes the code. The native f eatures are made available through an abstractio n layer. The interpreter interprets the source code o n runtime across different platforms and t hus supports cross platform application development. The interpreted application interacts with the abstraction layer to access the native APIs. In interp reted applications, the user interface is made up of platfo rm-specific native ele ments for user interaction. In contrast to previous approaches, an Interpreted application provides the look and feel of a native application. Interpreted application is acce ssible through t he application store. The device hardware and platform features are wrapped with specific framework Application Programming Interface (API).The main disadvantage of interpreted application is that the development is dependent on t he feature set provided by the selected framework. The runtime interpretation of the code might degrade the performance of the application. Some finetuning can be required to tweak the look of the application. Appcelerator Titanium [16] is among tools based on this approach. We will describe it in the appropriate section. Other tools that help to create in terpreted applications are Ado be Flash Builder [17] and Rhodes [18]. D. Cross-compiled Approach The cross-compiled approach consists of providing the common programming languages for s pecifying applications. The developers can write the source code in a common programming language and the cross-compiler compiles the source code into particular native code by converting the source code to native binaries. The main advantage of cross-compiled applications is that they provide all the features that the native applications 10 IT4OD 2014

17 provide. Device hardware and software can be accessed here. All the native user interface com ponents can be used. Applications run with native performances and can be deployed on the application stores. Even if they are accessible, platform specific features such as camera access, loca tion services, local notifications, etc. cannot be reused. T hey are platform specific and the way to access varies from platform to other. Identifying and rectifying the cross-compilation phase issues will b e difficult for developers. Applications need to write in the language used by the cross-compiler. The cross platform solutions currently available in the market are not mature enough. A very prominent example for a c ross-compiler tool is Xamarin [19] which is the framework that we chose for that approach. We can also add QTM obile [20] as cross-compiler tool. E. Model Driven Approach Model driven approach bases the development of applications on models used to describe the application. Using a DSL (more specific) or UML (more generic), a developer can describe an application on a high level, without having to deal with low-level technical issues such as how to fetch data or how to cache it. Automatic transformations generate source code for the supported target platforms from developer defined models. Generated applications are truly native and respect the native look & feel. The main advantage of m odel driven approach is that provides a complete native application. The application runs on native environment without intermediate layout. The user can appreciate native look & fee l and nat ive performance of t he generated application. Since the generated application is a native application, it is visible in application stores and can be installed in the device. Developers have to maintain the model of the application to maintain the native code for each platform. Nevertheless, Model driven approach is limited to the application domain of t he model language. Only applications that fall in to the category supported by the model can be modeled. Otherwise, the language and the code generator must be enhanced. The generated code is still incomplete and must be completed manually with the native language and S DK tools. Thus, developers have to write this code fo r each platform individually. Integrating manually written code is still an issue. Even if most of the tools that implement this approach a re still in early stage and they ar e not yet very popular, we have still selected the AutoMobile Project [21] as representative of model driven cross pl atform development solutions for our study. Most existing model-driven solutions like canappi mdsl [22] or applause [23] have not been much development progress lately, or n ot relevant in general practice like mobl [24], or also not active anymore like iphonical [25]. However, model-driven approaches app ears as the most promising approach for cross-platform mobile development and gives rise to various research projects such as Xm ob [26], MD² [27] or AXIOM [28]. III. RELATED WORK Researches addressing the area of m obile development have evolved to head t owards cross platform mobile development. Until recently, papers only discussed m obile platforms as we can see with th e works presented in [29] and [30]. For existing work dealing with cross-platform mobile development, we can refer t o [31] that have compared mobile platforms with regard to the openness of their architectures. In [2], the authors com pare the development of n ative applications and Web applications. Even if these works address more than one platform, they introduce the cross platform perspective only marginally. While most articles dealing with cross-platform mobile development provide comparison criteria of cross-platform mobile development approaches or evaluation criteria for technology implementation of t hese approaches [33][34], in this paper, we focus on how to define the desirable requirements for cross pl atform mobile development tools. Indeed, with the emergence of m ore than 100 tools for this field, we t hink it is time to ask t he question about the requirements that such tools are required to provide. We can find some related works that can be identified whenever we look for the desirable requirements of the cross platform tools. In [35], even if the authors discuss the desirable requirements in a cross pl atform framework, their aim is different. In fact, they have devel oped Android application with four cross platform tools (PhoneGap only, PhoneGap + JQuery Mobile, PhoneGap + Sencha Touch 2.0, Titanium) to evaluate the performance of such tools. In [33], the authors present an evaluation of three frameworks (Titanium, Rhodes, PhoneGap + Sencha Touch) versus two native SDKs (Android SDK, ios SDK) in criteria pertaining to functionality, usability features, performance and other categories. In [4], the authors elaborat e on a l ist of 14 cri teria for evaluating cross-platform development approaches. T hese criteria have been struc tured into infrastructure and development perspective. The infrastructure perspective sums up criteria relating to the life-cycle of an app, its usage, operation and functionality/functional range. The development perspective covers all criteria that are directly related to the development process of the app, e.g. topics like testing, debugging and development tools. These criteria can be seen as requirements of cross platform tools but this is not explicitly mentioned because it is not the purpose of this article. In addition, all approaches are not addressed in the evalua tion (Just PhoneGap and Titanium are evaluated). Despite the topic seems to be si milar to our work, our aim is di fferent. In our work, we classify cross platform development approaches into five distinct categories. We include, in our analysis, one tool of each approach for an overall assessment. The evaluation of such tools will be based on t he requirements that we define in the next section. As we can see, pre vious research in this area is sparse a nd often focuses on comparing a set of e xisting frameworks. In this paper, we would like to provide the desirable requirements of cross platform tools. The aim is to identify the required elements in the process of developing a cross platform mobile application, and t hus, such required elements that should 11 IT4OD 2014

18 provide a t ool to claim the title of cross pl atform tool. Thus, our work can be considered as a complementary work. IV. REQUIREMENTS OF A CROSS PLATFORM TOOLS The main objective of cross platform mobile development approaches is to provide a mobile application that can execute on multiple platforms. However, we must not lose sight that the main motivation is to provide a cross-platform application which is clo sest to the concept of native application. The native mobile applications are specifically intended to device s operating system. The application is downloaded from an app store and resides on the device. On the one hand, native applications: 1) ha ve access to all native functionality of the device (GPS, ca mera, calendar, file system, etc.) 2) are installed on the device and accessible offline 3) the native look & feel is widely appreciate by users [2][36]. On the other hand: 1) native applications are able to run only on device s operating system for which they are des igned to, what oblige com panies to produce the same application once for e ach target mobile platform 2) uncovered platforms represent lost customers for companies 3) maintaining separate applications requires development, testing, and di stribution each tim e a new OS release or device model becomes available [2][36][3]. The question of supporting multiple platforms is not new. The same problem occurred, 20 years ago, for PC platforms (Windows, Unix, Mac OS, etc.). With the o mnipresence of mobile devices in our daily life and fragmentation of platforms, it makes sense that developers are turn ing to multi-platform development and thus that framework vendors provide appropriate solutions [34]. Based on literature and our own analysis, we have i dentified the desirable requirements of any cross-platform technology presented as follow: A. Mobile platform supported Cross platform approaches by definition must support several platforms. This requirement takes into account the number and i mportance of supported mobile platforms, e.g. iphone, Android and Windows Phone are pract ically mandatory since they are the largest shared mobile platforms. In addition, the equality of the support between the platforms should also be considered. B. Development environment Development environment covers various parameters. In this requirement, we consider the features of the development environment offered with the framework such as an Integrated Development Environment (IDE,) debugger, compiler, emulator, etc. and functionalities as source code editor, intelligent auto-completion, which usually accompanies the IDE. In addition to source code editor, the opportunity to create the graphical user interface through a WYSIWYG editor and to test the application without having to deploy it to a device or an emulator constantly is greatly appreciated. Also, the m aturity of the development environment reflects the maturity of the framework. C. API and documentation In this requirement, we discuss the documentation quality of the fram ework and APIs available. The influence of the documentation on the quality and ease of l earning is reflected in the progress of a developer during his training of a framework. The use o f APIs determines the feature of the application developed. APIs are speci fic to devi ces/platform and their availability varies from one framework to another. D. Security Smartphone are considered as easy to trap objects whose flaws are alrea dy known a nd exploited (tracks conversations, data recovery, scams). Applications developed with cross platform tools are not highly secure [37]. When it is considered that each mobile OS and each mobile device has its own flaws, it is d ifficult to apply a secu rity policy to an application designed to run on multiple platforms/mobile device. The ideal would be to introduce the concept of security in the heart of the development process of t he application [38][39][30]. Proper research needs to be carri ed out to secure the tools and applications. E. Access to device specific features The kind of a pplication determines its c apacity to access the features of the mobile device. There is a difference between the features according to nativ e application and web site application. The functionality requirements of an application can be identified as fo llow: 1) informational requirements, where the user primarily consumes content 2) t ransactional requirements, where the us er primarily interacts with the application to accomplish a task and 3) Device-spec ific requirements, e.g. offline transaction entry or file access. Most frameworks support standard device features, e.g. camera, GPS, Accelerometer, etc. and provide access to such features through intermediate layer as APIs. F. Resource consumption Resource consumption relates to the application developed with the cross platform framework. This requirement include in order: memory usage, C PU usage and power consumption [37]. Mobile phones like other pervasive devices suffer from resource shortages. These resources can influence each other, for example CPU utilization affects b attery consumption. Several studies have been undertaken on this subject [40][41][42]. The memory usage may increase for var ious reasons, this can be due t o the addition of features to generate user interface or to the use of HTML and JavaScript files. Several research works [43] [44][45][46] have appeared recently dealing with the power consumption of mobile applications. In mobile devices, the power is the most important resource [47] and the applications developed using cross-platform tools must use the battery of mobile devices effectively. G. Look and feel Success on an application depends i n large part on use r experience. Currently, the cross platform tools try to reproduce as closely as possi ble the look and feel of t he native application. Most users seek ap plications that resemble to native applications, in term of graphic user interface and reactivity of the application. Indeed, mobile is a device when applications have to be interrupted by event as a call or SMS. When users want to react to this event, af ter that, he w ants returning to the application where he left it. Fo r that purpose, 12 IT4OD 2014

19 the support for backend co mmunication protocols and data formats becomes mandatory. H. Visibility The way to distribute the application will determine the visibility of this application. Generally, users are turning to app stores of mobile platforms to obtain an application, when web site applications are accessible only through URL and inte rnet connection. In addition, this requirement determines also the way of how to update and maintain the application, i.e. an y application update in the mobile device is required for web application since data and application are hosted on the server. V. DISCUSSION In this section, we have sel ected five cross platform tools, one per approach, in order to provide a description of such tools, according to the desirabl e requirements that we have detailed in the previous section. The main criteria of selecting these tools were their popularity and extensive use, especially for the first four tools. For the last one, even i f model driven tools are not mature enough, Automobile Project was selected because it appears as the most promising tool in its category. We do not aim to compare these selected tools. But, our objective is to provide, for each approach, an overview of the existing cross platform tools based on what we have presented as the necessary requirem ents that must be met by any c ross platform tool. A. Example of web approaches: Sencha Touch Web applications are m ost often obtained through frameworks such as Sencha Touch [ 11] that allows creating free web applications. Sencha Touch is an HTM L5 mobile application framework for building web appl ications. Sencha Touch 2.3.1, which is the latest version at the time of this writing supports Android browser, Google Chrome for Android, BlackBerry 10, Bada Mobile Browser, Kindle Fire Browser, Windows Phone 8 an d Mobile Safari. In fact, this version of Sencha Touch only targeted at webkit browsers. Sencha Touch can be used with Apache Cordova/PhoneGap or Se ncha s native packager. Either enables to package th e application in a nati ve container and enables access to select device-level APIs una vailable to traditional web appl ications. However, the application suffers from the lack of performances due t o the execution on the browser. Sencha Touch does not provide IDE for developing Touch applications but we can l ook to IDE as Net Beans, Webstorm and Aptana. But none of them provides a great experience programming. In addition, Touch provides a plugin for Eclipse environment. B. Example of hybrid approaches: PhoneGap This framework is based on the open source Cordova project. PhoneGap supports as for now (version 3.3.0) 7 mobile platforms: Android, ios, BlackBerry, webos, Windows Phone 7 and 8, Symbian and Bada. It allows developers to create mobile applications using modern web technologies as HTML5, CSS3 and JavaScript [48]. PhoneGap is a web code wrapper, i.e., PhoneGa p operates by packaging the web pa ge with the specific PhoneGap engine which is specific for each supported platform. This engine displays the page in a regular web view. Additionally, PhoneGap provides an access to device functionality as Accelerometer, Camera, Contact, GPS, etc. through JavaScript API s. The API is implemented differently for each platform. This is why the application result is hybrid, i.e., it is not purely native neither purely web [34]. The product result is a b inary application archive that can be distributed through the application market. PhoneGap does not allow having a centralized development environment because it does not provide an IDE to develop applications. Instead, it provides a servi ce called PhoneGap Build that allows developers to compile their applications in the cloud. Developers can also choice an IDE t o write the source code and take it to an appropriate IDE for the ta rget platform, e.g. Eclipse for Android, to additional code modifications. Even if applications developed with PhoneGap are elaborate, they have more look and feel as web application than native application. PhoneGap is more adaptable for pr ojects where cross platform reach i s more important than high performances and user experience [34]. C. Example of interpreted approaches: Appcelerator Titanium Appcelerator Titanium [16] is a development environment for creating native applications across different platforms, developed by Appcelerator Inc. Titanium applications are developed using web technologies as HTML5, JavaScript and CSS3. It uses JavaScript APIs and platform APIs to build the interface and to access native device features. Titanium links JavaScript to native libraries, compiles it to bytecode and then the platform SDK builds the package for the desired target platform. The code is the n packaged with Titanium s engine. At ru ntime, this engine interprets the JavaScript code and creates the user interface. The final application resembles to the typical platform appearance due to using native ele ments to build the user interface. Howe ver, application performances can be di sappointing due t o the interpretation of the source code every time the application runs. Titanium also includes the Titanium Studio which is an Appcelerator's IDE. It is allo ws the writing, testing, and debugging of mobile applications. It is also offer the possibility to run and deploy the applications [49][50][16]. Small numbers of platforms are sup ported by Titanium: ios, Android, Windows Phone, BlackBerry OS, a nd Tizen with, by the latest stable release version of the framework. D. Example of cross-compiled approaches: Xamarin Xamarin [19] is a devel opment environment for cr osscompiled applications that allows developing applications using the C# language. Xamarin 3, t he latest version of the framework, supports ios, Windows Phone and Android. 13 IT4OD 2014

20 To run the application on th e device, the cross compiler Xamarin translates the code of the application writing in C# to binary code t hat can run on the target platform. Xamarin applications are built with nativ e user interface controls. Thus, applications not only look like native application, they behave that way too. Xam arin also allows the access to all n ative devices and platforms features by providing all the native APIs of the supported platforms as C# libraries. Xamarin applications leverage platform-specific hardware acceleration, and are compiled for native performance [32]. Xamarin offers also its own IDE named Xamarin Studio. Studio provides a con vivial development environment with functionalities as co de completion, debugger, etc. Xamarin Studio allows also distributing the applications through the application stores [51]. E. Example of model driven approaches: AutoMobile AutoMobile Project [21] appear s as one of the most active and most promising in its category. AutoMobile is funded by the European Commission. The AutoMobile project exploits the modern paradigm of Model-Driven Engineering and code generation. It is based on abstraction, modeling and code generation to represent applications in a pl atform-independent manner, and then, generate the code to deploy the application and interaction logic onto the target platform. AutoMobile relies on modeling languages such as IFM L (Interaction Flow Modeling Languages) [52] and on tools like WebRatio [53]. What proposes AutoMobile: 1) a pl atform independent modeling language based on OMG standards (MDA, UML, IFML) for modeling applications 2) a set of sof tware components and an architectural framework acting as technical building blocks based on HTML5 and also target native applications (ios and Android) 3) a model-to-code generator, which consists of a set o f model transformations integrated in the existing WebRatio platform [54]. AutoMobile targeted only ios and Android platform for now. Our discussion let appear that although the user experience is not as good as nat ive applications, the cross pl atform applications can be depl oyed in several platforms at once t o reach out most of the potential users, which is essential for the application vendors. Documentation and a dditional developer support are available for all evaluated solutions, even for the young cross p latform frameworks. Cross platform tools are mostly mature and provide a com fortable development environment. The main disadvantage that we can observe with the evaluated tools is the ability to access to the device specific features which depends on the API available in the framework. Due to the cross platform implementation, the performances of the cross platform applications are prejudicially affected as compared to the native appl ication. Most of cr oss platform tools do not support native inte rface design, without having t o go through intermediate layers. Native elements are desired to present users a more familiar interface. Also, using native elements make sure that an application followed the guidelines of the user interface and thus, it doe s not risk to be re jected in the app stores. Finally, the question of application security is rarel y addressed. While native applications are hi ghly secure, securing applications developed with tools based on web technologies depends on browser security. Interpreted tools are likely to be more secure. Better security can be offere d by cross-compiled and generator tools [35]. VI. CONCLUSION Cross-platform solutions are recommended when the application is dedicated to multiple platforms with limitations in time and cost. Develop once and run anywhere is an important concept, and there are different approaches for achieving it. Cross platform solutions are based on five distinct approaches. This paper presen ted an ov erview of cr oss platform mobile development approaches. Drawing on that overview, we identified the desirable requirements of cross platform technology. Our analysis of five selected cross-platform tools according to these requirements showed that cross-platform solutions can be recommended in general, even if none of such tool satisfied all the requirements. Development of cros s platform mobile applications is an emerging and attractive field and many tools have been d eveloped to provide appropriate solutions. Consequently, it is difficult for devel opers and companies to make a rat ional choice without basing on established evaluation points. We think that the definition of the requirements that the tools supporting cross-platform mobile development should provide is a first step towards the definition of norms, enabling the standardization of t he cross platform mobile development process. In future work, requirements like security will be studied and will b e subject to further research. We aim to extend our analysis to other tools of cross-platform mobile development. We hope that additional experiments will strengthen the definition of the desirable requirements of cross platform mobile development tools. REFERENCES [1] Gartner: Gartner Says Smartphone Sales Accounted for 55 Percent of Overall Mobile Phone Sales in Third Quarter of Accessed on April [2] Andre Charland and Brian LeRoux, Mobile app lication development: Web vs. Native, In Commu nications of the ACM, pp , Volume 54, Issue 5, May [3] Rapid Value Solutions, How to Choose the Right Architecture For Your Mobile Application. Accessed o n April [4] Henning Heitkötter, Sebastian Hanschke and Tim A. Majchrzak, Comparing cross-platform development approaches for mobile applications, In Web Informa tion Systems and Techno logies, pp , LNBIP, Springer, [5] Peter Friese. Cross-Platform Mobile Development: Overview. Accessed on April [6] jquerymobile. Accessed on April [7] jqtouch. Accessed on April [8] Dojo Toolkit. Accessed on April IT4OD 2014

21 [9] AppsBuilder. Accessed on April h ttp:// [10] ibuildapp. Accessed on April [11] SenchaTouch. Accessed on April [12] PhoneGap. Accessed on April [13] MoSync. Accessed on April [14] mgwt. Accessed on April [15] Vaadin TouchKit. Accessed on April [16] Appcelerator Titanium. Accessed on April [17] Adobe Flash Builder. Accessed on Ma y [18] Rhodes. Accessed on April EN/RhoMobile+Suite/Rhodes [19] Xamarin. Accessed on [20] QTMobile. Accessed on April Mobile-Edition/ [21] AutoMobile: Automated Mobile App Develop ment. Accessed on June [22] mobl. Accessed on June [23] applause. Accessed on April [24] Canappi: Creating a mobile application with mdsl. Accessed on June [25] iphonical. Accessed on April [26] Olivier Le Goaer, Sacha Waltham, Yet Another DSL for Cross- Platforms Mobile Developmen t, In the 1st Workshop on Globalization of the Domain Specific Languag es (GlobalDSL'13), Montpellier, France, July 2013, in press. [27] Henning Heitkötter, Tim A. M ajchrzak and H erbert Kuchen, Cross-Platform Model-Driven Developmen t of Mobile Applications with MD2, In the 28th Annual ACM Sy mposium on Applied Computing (SAC 13), pp [28] Xiaoping Jia and Chris Jones, Cross-platform application development using AXIOM as an Agile Model-Driven Approach, In 7th International Conference, I CSOFT 2012, Rome, Italy, July 24-27, 2012, in press. [29] Yun Chan Cho, Jae Wook Jeon, Current software platforms on mobile phone, In Intern ational Conference on Control, Automation and Systems (ICCAS) 2007, pp Oct , [30] Feida Lin, Weiguo Ye, Opera ting System Battle in the Ecosystem of Smartphone Industr y," in Intern ational Symposium on Information Engineering and Electronic Commerce, 2009, pp [31] Mohsen Anvaari, S linger Jansen, Evaluating architectural openness in mobile software platforms, In 4th European Conference on Software Architecture, pp ACM, [32] Xamarin: Build apps in C# for ios, Android and Windows Phone. Accessed on June [33] Andreas Sommer and Stephan Krusche, Evaluation of crossplatform frameworks for mo bile applications. In the 1s t European Workshop on Mobile Engineering, February [34] Sarah Allen, Vidal Graupera and Lee Lundrigan, Pro Smartphone Cross-Platform Dev elopment: iphone, BlackBerry, Windows Mobile, and Android De velopment and Distribution. Apress, September [35] Isabelle Dalmasso, Soumya Kanti Datta, Christian Bonnet and Navid Nikaein, Survey, Comparison and Ev aluation of Cross Platform Mobile Appli cation Development Tools, 9th International Wireless Communication and Mobile Computing Conference (IWCMC 2013), July 2013, Sardinia, Italy, in press. [36] Rahul Raj C.P, Seshu Babu Tolety, A study on approaches to build cross-platform mobile applic ations and criteria to s elect appropriate approach, in Annual IEEE Ind ia Conference (INDICON), December 2012, pp [37] Dalmasso, S. K. Datta, C. Bonnet, N. Nikaein, Survey, Comparison and Evaluation of Cross-Pl atform Mobile Application Development Tool s, Wireless C ommunications and Mobile Computing Conf erence (IWCMC), vol. 2013, p , July [38] Denim Group, Secure mobile application development reference, [39] Ann Cavoukian, Marc Ch anliau, Privacy and security by design: a convergence of paradigms, January [40] Wanghong Yuan, Klar a Nahrstedt, Energy-efficient soft realtime CPU scheduling for m obile multimedia systems, In th e nineteenth ACM s ymposium on Op erating systems principles, October 19-22, 2003, Bolton Landing, NY, USA. [41] Reza Rawassizadeh, Mobile application benchmarking based on the r esource usage monitor ing, International Journal of Mobile Computing and Multim edia Communications, 1:64-75, [42] Rahul Murmuria, Jeffrey Medsger, Angelo s Stavrou and Jeffrey M. Voas, Mobile applic ation and d evice power usage measurements, in the IEEE S ixth International Conference on Software Security and Reliability, p , June 20-22, [43] Soumya Kanti Datta, Android st ack integration in embedded systems, in Int ernational Conference on Emerging Trends in Computer & Information Technology, Coimbatore, India, [44] Rimpy Bala an d Anu Garg. Ar ticle: Battery Power Saving Profile with Learning Engine in android Phon es. International Journal of Co mputer Applications 69(13):38-41, May Published by Foundation of C omputer Science, New York, USA. [45] Soumya Kanti Datta, Christian Bonnet and Navid Nikaein, Android power management: Current and future trends, int the First IEEE Workshop on Enabling Technologies for Smartphone and Internet of Things (ETSIoT), pp.48,53, 18 June [46] Jaymin Lee, Hyunwoo Joe and Hyungshin Kim, Smart phone power model generation using use pattern an alysis, 2012 IEEE International Conference on Consumer Electronics (ICCE), pp. 412,413, Jan [47] Robin Kravets, P. Krishnan, Power management techniques for mobile communication, in th e Fourth Annu al ACM/IEEE International Conference on Mobile Computing and Networking (MobiCom), Dallas, TX, October [48] PhoneGap: About the Project. Accessed on May [49] Gustavo Hartmann, Geoff St ead and Asi DeGani, Crossplatform mobile deve lopment, Mobile Lear ning Environment, Cambridge, March [50] Vision Mobile, Cross-Platfo rm Developer Tools 2012 : Bridging the worlds of mobile apps and the web, Februar y [51] Xamarin: Xamarin Studi o. Accessed on May [52] IFML: The Interaction Flow Modeling Language. Accessed on June [53] WEBRATIO. Accessed on June [54] AutoMobile: Results. Accessed on June IT4OD 2014

22 Design of a New Smart-Irrigation System in the South of Algeria Dr. Benahmed Khelifa Dept of Science, University of Bechar Bechar, Algeria Abstract Food self-sufficiency has been one of the main objectives of the agricultural policy of Algeria since the early years of independence, in recent years; it has successfully begun to exploit the huge deposits of Saharan agriculture. The south of Algeria meets all the conditions, there is a great land, Water and light, three essential elements for agriculture. Lack of water is one of the most serious threats to agriculture and the environment in several countries. Agriculture is the largest consumer of water, it represents up to 85% of the consumption of water resources. The adoption of appropriate policies saving water should minimize the consequences. In this regard we propose in this paper a new water-saving smart irrigation system. This system is designed on the basis of the information technologies and communications for the environmental monitoring, using the ZigBee wireless sensor networks (WSNs) and renewable energy technology. Smart-algorithms are proposed in this paper for the automation and management of irrigation. Index Terms Smart water, smart irrigation, renewable energy, wireless sensor networks. I. INTRODUCTION Agriculture uses 85% of available freshwater resources worldwide, and this percentage will continue to be dominant in water consumption because of population growth and increased food demand [1]. In Algeria, liberalization reforms undertaken since the 1990s have resulted in negative effects on the development and management of irrigation schemes as well as on the conditions of farmers [2], [3], [4]. In addition Algeria is ranked among the most water deficient countries, due to her membership in the geographic area of the Middle East and North Africa "MENA". Thus, Algerian agriculture cannot cover the needs of the population, especially regarding the strategic and widely consumed food. In our study area the Sahara of Al geria has known various problems of agriculture caused by the drought that has lasted several months due to water shortage and the traditional method of irrigation, which makes water utilization efficiency is extremely low. The aim of our work in this paper is to propose a new smart-irrigation system based on wireless communication using the ZigBee wireless sensor networks (WSNs) and renewable energy technologies such solar energy, wind energy, etc, for sustainable use of water, and to facilitate agricultural activity by automatic irrigation of agricultural land. Using solar energy solves the inconvenient problem of l aying wires through the farmland, and t he self Douli Amel Dept of Science, University of Bechar Bechar, Algeria powered devices can measure the soil moisture continuously. This system has the potential to be useful in the water scarce areas to avoid waste of water resources. The rest o f this paper is o rganized as fo llows. Ex isting techniques for saving water are revi ewed in Section II. The existing state of agriculture in the south of Algeria is presented in section III. Then we describe our proposed approach in Section IV. Finally, we summarize the paper in Section VII. II. RELATED WORKS There are many systems to achieve water savings in various crops, from basic ones to more technologically advanced ones. For instance, in one syste m plant water statu s was monitored and irrigation scheduled based on canopy temperature distribution of the plant, which was acquired with thermal imaging [5]. In a ddition, other systems have been developed to schedule irrigation of cr ops and optimize water use by means of a cr op water stress index (CWSI) [6]. The empirical CWSI was first defined over 30 years ago [7]. This index was later calculated using measurements of i nfrared canopy temperatures, ambient air temperatures, and atmospheric vapor pressure def cit values to determine when to irrigate broccoli using drip irrigation [8]. Irrigation systems can also be aut omated through information on volumetric water content of s oil, using dielectric moisture sensors t o control actuators and save water, instead of a pre determined irrigation schedule at a part icular time of t he day and wi th specif c duration. An irrigation controller is used t o open a s olenoid valve and ap ply watering to bedding plants when the volumetric water content of the substrate drops below a set point [9]. In [10] a water-saving irrigation system is designed based on the wireless communication and sol ar energy technology. Using the ZigBee wireless sensor network for collecting the soil temperature and moisture information, and transmits the data to the remote monitoring control computer by the GPRS network. The administrator can monitor the soil parameters and co ntrol the irrigation on the remote computer according to the specific needs of plants. Another irrigation system was developed in [11] based on Fuzzy Control Technology and Wireless Sensor Network. The fuzzy controller embedded in the coordinator node carries out fuzzy inference and fuzzy decision to soil moisture information in order to decide whether or not to conduct water and how long irrigation time is. 16 IT4OD 2014

23 III. AGRICULTURE IN SOUTH OF ALGERIA The amount of water required for irrigation of agricultural land depends on climatic conditions, soil texture and irrigation technique used. The soil is considered of medium type "sandy loam," allowing good water retention and preventing deep percolation. The traditional surface irrigation is still use d in south of Algeria. In 2000, the Algerian government initiated a National Plan for Agricultural Development (PNDA).The plan aims to rehabilitate irrigation systems, to reduce industrial water consumption through the introduction of d rip-irrigation systems and to increase agricultu ral workers' incomes to stem rural flight. Drip irrigation is based on monitoring the wetness of the fraction of the soil occupied by the roots rather than the entire surface. By this technique, the wa ter is dis tributed to the soil surface by drippers, placed at the base of each plant [12], which each provide a low speed so that the duration of irrigation becomes almost quasi-continuously. Sources of water for irrigation can be wat er under land, or water collected through collection systems rainwater. Water can be extracted by pumps from wells, or by gravity. implementation of au tomatic watering. The controller of the electric pump is responsible to enable or disable the pump, based on information from the senso r detection of t he water well level and the sensor detection of the water tank level. The sensor network is low power consumption and use a lead acid battery charged by the solar cell panel. The electric pum p can be c harged either by renewable energy or by electric power. The figure 2. shows our system structure. Fig. 2. The system structure. Fig. 1. Agriculture in the south of Algeria. IV. OUR PROPOSED APPROACH A. The system design The whole system is composed of wireless sensor network system and renewable energy power system. There are two parts which are source of water for irrigation and agricultural land. Wireless sensor network consists of se nsor node soil moisture, sensors detection of the water level, irrigation controller and controller of t he electric pum p. Irrigation pipe network is laid over the irrigation areas and sol enoid valve controllers are i nstalled on pi pelines. The sensor no de soil moisture and t he sensor det ection of t he water tank level measure soil moisture and the water level respectively, transmit information to the solenoid valve controller which controls the In the water scarce area far from farmer, this system can be used broadly and to avoid waste of water resources. B. The design of the soil moisture sensors and irrigation controller We adopt the MSP430F2274 low power consumption 16- bit MCU as th e node controller. This part is mainly in charge of the connection of soil moisture sensors and solenoid valve controller as well as th e data transmission of th e radio frequency (RF) communication module (C51RF-3-CS CC2430). The 2.4G r od antenna is used t o enlarge the communicating distance. The C C2430 is the first 2.4 GHz System on Chip (SOC) which meets the ZigBee standard and compatible with the 2.4G IEEE com munication protocol [13-14]. The soil moisture sensor been selected is M 406B.The working voltage is 1 2V, the output voltage is 0 ~1V, and the measuring range is from 0 to 100%. The output voltage is sampled by the i nner AD converter of t he MSP430F2274 controller at 200Ksps. [10] The solenoid valve which can switch the water co nduit connected with a rotating micro- nozzle. When the farmland needs to be irrigated, the solenoid valve is set open by the MSP430F2274. And the water sprays from the nozzle, drives 17 IT4OD 2014

24 the refraction arm rotation. So the effective radius is expanded, and the irrigation intensity can be lower. Using this kind of nozzle can save a large amount of water. [10] To overcome the inconvenience of repl acing the battery frequently, a maintenance-free lead-acid battery (12V, 7AH) is used, being recharged by a solar-cell panel [15]. The solar cell is CY-TD02-6 which works at 18V, and the output power is 2W. C. The design of sensors detection of water level and electric pump controller Taking advantage of the electrical conductivity property of water, we use the copper conductors as the water level sensor. When water touches the copper sensor positioned a t a particular level in the tank, voltage is transferred to the copper which in turn is transferred to the comparator circuit for further processing. The LM324 comparator was used to compare the inputs from the electrodes in the tank and wi th a p re-set resistance and output a HIGH or a LOW with respect to the result from the comparison. This HIGH or LOW was fed into the microcontroller which in turn uses this to control the water pump and display the appropriate status on an LCD screen. The programmable Atmel 89C52 microcontroller was programmed in Assembly Language and was used as the processor to control the functionalities of the entire system. A Liq uid Crystal Display (LCD) served as the output unit which showed the status of the system on a screen. Relays were used in building a switching unit that simply triggers the pump on or off, depending on the signal received from the microcontroller. V. SOFTWARE SYSTEM DESIGN Sensor node soil moisture and sensor detection of water tank level separately collect soil moisture and water lev el at each sampling cycle and transmit data to the solenoid valve controller node, which permits the automatic activation of irrigation according to thres hold value of soil moisture and water tank level, in order to decide whether or not to conduct water and how long irrigation time is to open solenoid valve to irrigate corresponding region. This system can im plement water-saving irrigation to crops. After a sa mpling period, each sensor node immediately comes into state of hibernation to conserve energy until the next cycle. Sensors node detection of water level in the tank and the well measure water level, transmit information to the electric pump controller node, so that it takes a decision whether or not to open the electric pump. Flow charts of software design for sensors nodes, solenoid valve controller node and the electric pump controller are expressed in Fig. 3, Fig.4, Fig.5, Fig. 6 and Fig. 7. N Begin Set wake-up clock RTC Dormant RTC wakeup Collect data (soil moisture) Data (moisture+ localization) it sent to irrigation controller N Y Fig. 3. Software flow chart of sensor soil moisture node. Begin Set wake-up clock RTC RTC wake-up Detect water tank level Data is sent to the electric pump controller (treshold1 min, threshold1 max Data it sent to the irrigation controller Fig. 4. Software flow chart of sensor detection of water tank level. Y 18 IT4OD 2014

25 Begin Begin Receive data from sensor detection water tank level Receive data from sensor detection of water well level (treshold2). Receive data from sensor soil moisture Receive data from sensor detection of water well level (treshold1min treshold1max). If level1> threshold 1 min and value soil moisture>= threshold Y N N level 1 <threshold1 min and level 2 >=threshold2 Set irrigation time Y Control solenoid Activate the electric pump N Time is over? N Level 1>=threshold 1 max Y Solenoid (turn-of) Y Pump turn-of Fig. 5. Software flow chart of irrigation controller node. N Begin Set wake-up clock RTC RTC wake-up Y Detect water level Data it sent to electric pump controller (threshold2) Fig. 6. Software flow chart of sensor detection of water well level. Fig. 7. Software flow chart of the electric pump controller node. VI. CONCLUSION In this paper we pro posed an effi cient irrigation watersaving system based on wireless communication and renewable energy, this system can be easily located in various parts of the agricultural land. The wireless network is based on the ZigBee wireless sensor. We treated two parts in this system, the first part is to auto fill the water tank from the well, wh ile the second is to irrigate by automatic sprinklers agricultural land, according to the specific ne eds of the plants. In the water scarce area far from farmer, this system can be used broadly. As a pers pective we pl an implemented our sol ution in simulators dedicated to this application and then applied it in the agricultural lands of south of Algeria. REFERENCES [1] W. A. Jury and H. J. Vaux, The emerging global water crisis: Managing scarcity and conf ict between water users, Adv. Agronomy, vol. 95, pp. 1 76, Sep [2] Conseil National Economique et Social (CNES), L eau en Algérie : le gran d défi de dema in. 15th session, national report. Alger May IT4OD 2014

26 [3] Conseil National Economique et Social (CNES)., Problématique de Développement Agricole : El éments pour un débat national. 14th session, national report. Alger, Nov [4] O. Bessaoud, «L agriculture al gérienne : des révolutions agraires aux réformes libérales de 1963 à 2002». Topics in Du Maghreb au Proche-Orien t, Orient, les défis de l agri culture, L Harmattan, Paris, 2002, pp , [5] X. Wang, W. Yang, A. Wheaton, N. Cool ey, and B. Moran, Eff cient registration of optic al and IR im ages for autom atic plant water stress assessment, Comput. Electron. Agricult., vol. 74, no. 2, pp , Nov [6] G. Yuan, Y. Luo, X. Sun, and D. Tang, Evaluation of a crop water stress index for detecting water stress in winter whea t in the North China Plain, Agricult. Water Manag., vol. 64, no. 1, pp , Jan [7] S. B. Idso, R. D. Jackson, P. J. Pinter, Jr., R. J. R eginato, and J. L. Hatf eld, Normalizing th e stress-degree-day parameter for environmental variability, Agricult. Meteorol., vol. 24, pp , Jan [8] Y.Erdem,L.Arin,T.Erdem,S.Polat,M.Deveci,H.Okursoy,and H. T. Gültas, Crop water stress index for assessing irrigation scheduling of dr ip irrigated bro ccoli (Brassica oleracea L. var. italica), Agricult. Water Mana g., vol. 98, no. 1, pp , Dec [9] K. S. Nemali and M. W. Van Iersel, An automated system for controlling drought stress and ir rigation in potted plants, Sci. Horticult., vol. 110, no. 3, pp , Nov [10] Li Wenyan, Design of Wir eless Water-Saving Irrigation System Based on Solar En ergy, Control, Automation and Systems Engineering (CASE), International Conference on, IEEE, pp.1-4, July 2011 [11] Peng Xiaohong, Mo Zhi ; Xiao Laisheng ; Liu Guodong; A Water-Saving Irrigation System Based on Fuzzy Control Technology and Wireless Se nsor Network,Wireless Communications, Networking and Mobile Computing, WiCom '09. 5th International Conference on, IEEE, pp. 1-4; Sept [12] L. Zella and A. Kettab, Numerical Methods of Micro Irrigation Lateral Design, Revue B iotechnologie Agronomie Société Environnement, Vol. 6, N 4, pp , [13] Zhang X., Fang J., Yu X. Desig n and implementation of nodes based on CC2 430 for the ag ricultural information wireless monitoring[c]. Singapore, Singapore: IEEE Computer Society, [14] Xu H., Luo J., Luo M. Mobile node localization algorithm in wireless sensor networks for intelligen t transportation systems[c]. Hong Kong, Hong kong: IEEE Co mputer Society, [15] Valente A., Mo rais R., Serodio t C., et al. A ZigBee sensor element for di stributed monitoring of soil parameters in environmental monitoring[c]. Atlanta, GA, United states : Institute of Electrical and Electronics Engineers Inc., IT4OD 2014

27 The evolution of collaboratif tagging systems and the relation with the emergence of language: Agent based modeling Ahmed Mokhtari École nationale Supérieure d Informatique (ESI ex. INI) Algiers, Algeria Salima Hassas University Claude Bernard - Lyon 1 Lyon, France Abstract In the collaborative tagging systems and in the social web in general, users annotate resources using key words (tags) in order to categorize and organize their personal resources. In this paper, we study the emergence of tagging language formed by (tag/resource) couples shared by the users of the tagging system. Using agent-based modeling, following the perspective of complex system, we show that a shared language can emerge in a collaborative tagging system. Keywords social web, collaborative tagging systems, language emergence, collaborative tagging, tag, Multi agents systems. I. INTRODUCTION In the collaborative tagging systems and in the social web in general, users annotate resources by tags to create a categorization system. To what extent, the proposed tags help the emergence of a tagging language shared by the users of the collaborative tagging system? Our motivation to this work is to improve the existing tagging systems by an emergent language that is established and will be used by the users in describing resources on the web and also will be used as a basis for a query language for resources search and retrieval. The works done in the modeling of the language emergence and language evolution have shown that group of agents can establish a common language, by applying certain mechanisms like self-organization, imitation and reinforcement learning [1]. These mechanisms help in the emergence and the evolution of a shared language. In this paper, our contribution is development of agent based models for studying the dynamic of tagging systems and also the possibility of language emergence in such systems. By applying emergence of language mechanisms on collaborative tagging system we show that the users of such system develop a shared language that can be used on the resources search and sharing. Related work Among the works that are close to our work which is the evolution of collaborative tagging systems and the emergence of language, we cite as examples: The work presented in [2] is concerned with the structure of collaborative tagging systems and the dynamics observed in these systems and it shows that the growth of the number of the tags is described by a power law. In [3], there is a study of the evolution of tagging systems within a semiotic perspective; in [4] an epistemic model for collaborative tagging system is presented. A study of the complex dynamics of tagging systems is presented in [5] where tripartite model is introduced for modeling the structure of the tagging system. In the work of Santos-Neto and Condon [6] individual and social behaviors are analyzed in tagging systems and probabilistic model is presented. In the works of Steels and Kaplan [1] [7] [8] [9] [10] emergence language model is developed in which group of agents establish a common language and it can be considered as good framework for studying the emergence and evolution of language. A. Problematic This article aims to study the evolution of collaborative tagging systems and the conditions for the emergence of folksonomies, and also we present a study of these systems in a linguistic perspective considering folksonomies as language to answer the following question: Is there a relationship between the emergence of language and the evolution of tagging systems? What are the linguistic foundations that can explain the evolution of these systems to the emergence of folksonomies? This problematic is addressed by using multi-agents based modeling. 21 IT4OD 2014

28 B. The tripartite structure of tagging systems The tripartite model has been theorized in [5].There are three main entities that form any collaborative tagging system: The system s users. The tags themselves. The resources being labeled. Fig. 2 Growth of the number of tags in a real tagging system Fig. 1 - The tripartite structure Each of the three entities may be seen as separate space which form sets of nodes connected by edges. C. Definition of folksonomy A folksonomy is a tuple F: = (U, T, R, Y, pt) [4] where U, T, and R are finite sets, whose elements are called users, tags and resources respectively. Y is a ternary relation between them, Y U T R, which represent the action of assigning tag. Pt is a function: Y n which assigns to each element of Y timestamp n. It corresponds to the time when a user has assigned a tag to the resource. D. The evolution and dynamics of a tagging system The study of collaborative tagging systems shows that they evolve over time by complex dynamics [2]. This dynamics is held by the user-system interactions through resources annotation. The observation of the use and the reuse of tags shows that the growth of the number of the tags is described by a power law with an exponent smaller than one and this distribution gives rise to emergence of folksonomies (Fig. 2). II. MODELING DYNAMICS OF TAGGING SYSTEM For this model, our aim is to generate the dynamics observed in the collaborative tagging system and in particular the evolution of the number of tags that follow a "power law" distribution (fig.2). The values of the parameters of the model are taken from the work of Santos-Neto and Condon [6] where the individual and social behaviors in tagging systems are analyzed and modeled analytically but we use agent based modeling rather than analytical model to show that complex system approach can also be used since the social web technologies are by their nature an adaptive complex system [11]. A. Model components A collaborative tagging system consists of three parts, a set of users, a set of tags and a set of resources. These three sets are connected by links made by operation of collaborative tagging. In our model: System users are simulated by software agents, Tags are randomly generated words. Resources are represented by keywords also randomly generated. B. Model parameters The most important parameters of our model are: The number of agents that access the system and participate in the tagging operation. The number of the most used tags presented to an agent to allow a social imitation. The similarity threshold for measuring the similarity between two resources. For measuring the similarity inter resource, we used a lexical function defined as 22 IT4OD 2014

29 2 * Commons _ prefix (key_word i, key_word j) Sim( ri, rj ) Length (key_word ) Length (key_word ) Where r i, r j are two resources represented by the two keywords key_word i and key_word j respectively and Common_prefix is function that returns the number of common character situated in beginning of the two key words. i j C. Interaction The interaction of the agents with the system is as follows: 1. An agent accesses the system. 2. The system provides access for that user to all tags used in the previous sessions; the system also provides access to a set of most used tags by other users. 3. The agent chooses a new resource or an existing resource in the system. 4. The agent has three options to choose a tag to be assigned to the selected resource: He chooses a tag from his personal tags; in this case we have a self imitation. He chooses a tag amongst most used ones by the community of the tagging system users; in this case we have a social imitation. He creates a new tag if he determines that the resource belongs to a new category that does not exist in its personomy (personal tags) nor in the folksonomy. 5. He sends the identifier of the resource and the tag he chose to the system as an input "Ui, Ti, Ri, ti" Where Ui, Ti, Ri, ti represent the user, the tag the resource, a time stamp. D. Results and Discussion We used Jade framework and java programming language in the development of our multi agents system. We will present some results of a simulator designed based on the model described above. Our goal in this phase is to regenerate the complex dynamics of tagging systems. By modifying the parameters of our model, we obtain the same dynamics observed in the real collaborative tagging systems. Example1 In the first simulations, we use a very limited number of agents as for example N = 10 and a threshold agents similarity measure S = 0.4. The figure below shows the evolution of the number of tags over time. fig. 3 -Evolution of the number of tags In the figure 3, the convergence of the number of tags to 671 tags can be observed after 1019 iterations and it's the same dynamics observed in a real collaborative tagging system (a power law). Example 2 In this example, we use a number of agents N = 500 agents and a threshold of similarity measure S = 0.4. Fig. 4 -Evolution of the number of tags In the figure 4, The convergence of the number of tags to 962 tags can be observed after 1363 iterations and it's the same dynamics observed in previoud example and in the tagging system.. This is due to the principle of individual and social imitation used in our model which is based on the reuse of system s tags that allows the convergence and the emergence of folksonomy. E. Summary and Discussion From the examples above and from other examples, we see the emergence of folksonomies in the simulated tagging system and the generation of the dynamics of such systems. The quality of the folksonomy in terms of number of tags dependents on the threshold chosen for defining inter-resource similarity which defines somehow the degree of freedom for the agent to consider two resources as having the same content so its must have the same tag. 23 IT4OD 2014

30 III. LINGUISTIC MODELING OF TAGGING SYSTEM We are interested in this model to the cognitive and especially to the linguistic aspect of the process of collaborative tagging. To study the linguistic aspect of the tagging process, we enrich the previous model as follows: We endow the agents with cognitive structures (associative memories); these memories aim to store linguistic knowledge about the assignments of tags to resources (couples tag / resource). A reinforcement learning mechanism so that agents can learn and adapt their knowledge by updating their associative memories. The agents interact with the system by following scenarios of language games [1]. This model is based in its linguistic side on the models proposed in the work of Kaplan [1]. A. Model parameters The most important parameters of our model are the same like the first model: The number of agents that access the system. The number of the most used tags in the system. The similarity threshold for measuring the similarity between two resources based on the same function used in the previous model. B. Interaction The novelty in the interactions compared to the previous model is that they follow schemes inspired of language games adapted for collaborative tagging systems because the interactions occur between an agent and the system (not between two agents like in standard language games) where the agent (resp. system) plays the role of speaker or interlocutor to negotiate the assignment of a word (a tag) to a meaning (a resource). The interaction of the agents with the system is as follows: 1. An agent accesses the system. 2. The system provides access for that user to all tags used in the previous sessions; the system also provides access to a set of the most used tags by other users. 3. The agent chooses a new resource or an existing resource in the system. 4. Agent has two options to choose a tag to be assigned to the selected resource: He plays the role of a speaker and in this case the system plays the role of the interlocutor and we are in a situation of self imitation, ie the system learns words from the agent linguistic knowledge. He plays the role of an interlocutor and in this case the system plays the role of speaker and we are in a situation of social imitation, ie the agent learns from the system by imitating it overall language knowledge stored in the memory of the system. In both situations, there will be a negotiation between the speaker and the interlocutor to decide the appropriate tag assignment to the resource in question. 5. It adapts its associative memory by inserting new assignment or modifying previous one by the reinforcement learning. 6. It sends the identifier of the resource and the tag he chose to the system as an input "Ui, Ti, Ri, ti" Where Ui, Ti, Ri, ti represent the user, the tag, the resource, and a time stamp. C. Results and Discussion Examples of simulated scenarios are presented by modifying the model parameters (the number of agents, the threshold measure of similarity). To analyze the emerging language, we use the encoding matrix inspired from [1] and we redefine it as a matrix which columns are the tags of the system and the lines are system resources, and the intersection of a tag "t" with an resource "r" is the probability to put the resource "r" in the category represented by the tag "t". Then we use graphic representation to present the encoding matrix content in the form of clusters where each cluster stands for a word (a tag) and the resources that have been tagged by it. We will present two scenarios of simulation. 1. Example 1 We use a number of agents N = 10 agents, and a threshold of similarity measure S = 0.3 for this example. We calculate the encoding matrix to analyze the emerging language consisting of emerging categories in collaborative tagging system. Fig.5 Some emergent categories 24 IT4OD 2014

31 After having calculated the encoding matrix we have an emergent language formed of emergent categories and resources that belong to each category, some emergent categories are shown in the form of clusters; the figure below shows some emergent clusters in the tagging system. For example in the category represented by the word "xaqe», remember that the tags are randomly generated words, we note that its elements are resources that start with the character f. For agents, these resources belong to a common field. The same thing can be said about the category represented by the tag "joca" and resources starting with "u". 2. Example 2 We use a number of agents N = 100 agents and a threshold similarity measure S = 0.3 for this example. Through linguistic analysis of this example, we calculate the encoding matrix. The following figure shows some emergent clusters in our simulated tagging system. E. Conclusion In this work, we studied the dynamics governing the collaborative tagging systems through the first model using a complex systems based modeling and multi agent systems. This modeling approach allows the regeneration of the dynamics observed in the collaborative tagging systems through self-organization and a selections mechanism expressed by the principle of self imitation and social imitation. Our main objective of this work is the study of the emergence of a language shared by users of tagging system, we treated this objective through the second model, enriching the first model by cognitive structures in the form of associative memories and by using interaction scenarios based on language games, we have shown that users of collaborative tagging system can share a emergent tagging language that has lexical structure. Tagging language with grammatical structure seems to be a very good perspective and continuation of this work. The application of these results on existing collaborative tagging systems or the design of new tagging systems taking into account the linguistic aspect of these systems are future goals and a continuation of this work. The use of other models of the language emergence is another approach to addressing this issue to better understand the cognitive processes of collaborative tagging. References Fig. 6 Some emergent categories For example, the category represented by the word "jesa" contains resources that start with the character "h". D. Summary and Discussion After the results of the simulations mentioned above, a tagging language shared by users of collaborative tagging system, modeled by agents, emerges in the system and in associative memories of the agents since we are interested in the linguistic aspect of this process. The agents invent words (tags) in order to describe the resources of the system and to put them in different categories. [1] Kaplan, F. L'émergence d'un lexique dans une population d'agents autonomes, Ph.D thesis, Université Paris VI, [2] Golder, S., & Huberman, B. A., The structure of collaborative tagging systems. Journal of Information Science, 32(2), , [3] Cattuto, C., Baldassarri, A., Servedio, V. D. P., & Loreto, V.,Vocabulary growth in collaborative tagging systems. Arxive-print: [4] Dellschaft, K., Staab, S., An epistemic dynamic model for tagging systems. Proceedings of the 19th ACM Conference on Hypertext and Hypermedia HT'08 (p ), [5] Halpin, H., Robu, V., & Shepherd, H., The complex dynamics of collaborative tagging. Proceedings of the 16th international conference on World Wide Web (p ), [6] Santos-Neto, E., Condon, D., Andrade, N., Iamnitchi, A., et Ripeanu, M., Individual and social behavior in tagging systems, Proceedings of the 20th ACM conference on Hypertext and hypermedia, p , [7] Steels, L., The synthetic modeling of language origins. Evolution of Communication , [8] Steels, L., Kaplan, F., Collective learning and semiotics dynamics, in D. Floreano, J-D. Nicoud, F. Mondada F. (éds.), Advances in Artificial Life, Lecture Notes in Artificial Intelligence, p , Berlin, Springer- Verlag, [9] Steels, L., Language Games for Autonomous Robots. IEEE Intelligent Systems, [10] Steels, L., Kaplan, F., Bootstrapping Grounded Word Semantics, in T. Briscoe (éd.) Linguistic evolution through language acquisition: Formal and Computational Models, Cambridge University Press, [11] Rupert, M. ; Coévolution d'organisations sociales et spatiales dans les systèmes multi-agents: application aux systèmes de tagging collaboratifs (Thèse de doctorat spécialité Informatique). Université Claude Bernard, Lyon, IT4OD 2014

32 Integrating Linked Open Data in Geographical Information System Tarek ABID Department of Computer Science, University Mohamed Cherif Messaadia-Souk-Ahras, Algeria Abstract This paper describes a novel approach for developing a new generation of web-based geographical information System which enables to automatically publish on interactive maps the data extracted from DBpedia dataset. This integration can be regarded as the solution that connects the two domains, namely, Geographical Information System (GIS), and Linked Open Data (LOD). In order to benefit from the enormous potential of these domains, it is necessary that some basic notions of Linked Open Data be applied uniformly to each type of geographical data. Index Terms GIS, LOD, DBpedia, SPARQL, Dataset I. INTRODUCTION In recent years, the Open Government becomes increasingly mature. The essential idea of Open Government is to establish a modern cooperation among politicians, public administration, industry and private citizens by enabling more transparency, democracy, participation and collaboration [1]. Therefore, Open Government Data (OGD) is often seen as a crucial aspect of Open Government. Linked Open Data (LOD) has the same principle function as OGD, therefore LOD facilitates innovation and knowledge creation from interlinked data, it is an important mechanism for information management and integration. LOD are currently bootstrapping the Web of Data by converting into RDF and publishing existing datasets available to the general public under open licenses. Popular private and public stocks of the semantic web include, among others, DBpedia 1, a large reference dataset of structured information extracted from Wikipedia 2 providing encyclopaedic knowledge about a multitude of different domains. LOD has gained significant momentum over the past years as a best practice of promoting the sharing and publication of structured data on the semantic Web [2, 3].. In order to benefit from the enormous potential of these domains, it is necessary that some basic notions of Linked Open Data be applied uniformly to each type of geographical data. We based our research on the work of [4]. They have worked on the extraction, publication, and exploitation of data from the GIS of the Municipality of Catania (town in Italia), referred to as SIT (Sistema Informativo Territoriale). The SIT is designed to contain all the available data of the PA (Public Administrations) in Catania for the purpose of in-depth knowledge of the local area. As result, the authors in [4] have Hafed ZARZOUR Department of Computer Science, University Mohamed Cherif Messaadia-Souk-Ahras, Algeria used some tools for the extraction of data from the SIT, and for their modelling and publishing through LOD. Likewise, the LinkedGeoData (LGD) project was presented, in which it can transform OpenStreetMap 3 (OMS) to RDF and interlink it with other knowledge bases. LGD is a browser, which provides an interface for humans and enables to browse and edit this information efficiently [5]. This paper describes the methodology and tools used to extract, publish and reuse of LOD brovided by DBpedia and integarted in GISon the Web. This integration can be regarded as the solution that connects the two domains, namely, Geographical Information System (GIS), and Linked Open Data. The paper is organized as follows: Section 1 present the principal definition of GIS and web semantic. Section 2 describes the methodologies and tools for the extraction, modelling and publishing LOD for the DBpedia. Section 3 ends the paper with conclusions and future directions where we are headed. II. RELATED WORK Geographical Information System (GIS) a combination system between software and hardware that functions to store, map, analyze and display geographical data [6]. GIS is divided into five main components which includes human, procedure, software, appliance and data. Human means someone who works to manage database, analysis system and act as the programmer. Procedure is the way certain data is keyed in, stored, analyzed and moved into the system. Software on the other hand, is a component that has the package to key in, store, manage and analyze data. Other than that, appliance is needed in terms of technical necessity like a computer system that can apply GIS. Data is a vital component in GIS which can be divided in two types; geographical and tabular [7]. These data can be taken or bought from commercial data suppliers [8]. However, most of the suppliers use Database Management System (DBMS) to produce data storage for effective management [9]. Traditionally, there are two broad methods used to store data in a GIS for both kinds of abstractions mapping references: raster images and vector. Steiniger and Weibel [10] identified several types of GIS software: 1 : 2 : 3 : 26 IT4OD 2014

33 Desktop GIS (GRASS GIS [11], Quantum GIS [12], ILWIS / ILWIS Open [13], [14], ud ig [15], SAGA [16], [17], OpenJUMP [18], MapWindow GIS [19], gvsig [20]) Spatial Data Base Management Systems (SDBMS) : PostGIS, SpatiaLite Web Map Server, MapServer, GeoServer degree, MapGuide OpenSource, QGIS Exploratory Spatial Data Analysis Software OpenGeoDa, PySAL, R, GeoVista Studio, HiDE Remote Sensing Software : OSSIM, InterImage, Opticks, GDL, ILWIS, e-foto, GeoDMA, leoworks Web Map Server : MapServer, GeoServer, degree, MapGuide, QGIS mapserver Web Map Application Development Frameworks : OpenLayers, Leaflet, OpenScales, ModestMaps Server GIS : 52 North WPS, WebGEN, GeoServer, Zoo, PyWPS Mobile GIS : gvsig Mobile, Geopaparazzi RDF (Resource Description Framework) [21] builds on XML to better manage semantic interoperation. RDF is a data model designed to standardize the definition and use of metadata, in order to better describe and handle data semantics. A document in RDF is a set of triples (subject, predicate and object) that can be distributed, stored and treated in scalable triple-stores [22]. Web of Data, often referred to as Semantic Web or Linked Data, indicates a n ew generation of technologies responsible for the evolution of the current Web [23] from a Web of interlinked documents to a Web of interlinked data. The goal is to discover new knowledge and value from data, by publishing them using Web standards (primarily RDF) and by enabling connections between heterogeneous datasets [24]. Linked Open Data (LOD) denotes a set of best practices for publishing and linking structured data on the Web. The project includes several RDF datasets interlinked with each other to form a giant global graph, the so called Linked Open Data cloud [24]. DBpedia is a project aiming to extract structured content from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web [25]. The data are automatically extracted from freely available Wikipedia dumps and each article in Wikipedia is represented by a corresponding resource URI in DBpedia. Several RDF statements are generated for each resource by extracting information from various parts of the Wikipedia articles [24]. All versions of DBpedia describe 24.9 million things, out of which 16.8 million overlap with the concepts from the English Dbpedia. The English version of the DBpedia knowledge base currently describes 4.0 million things, including 832,000 persons, 639,000 places, 72,000 creative works (including 116,000 music albums, 78,000 films and 18,500 video games), 209,000 organizations (including 49,000 companies and 45,000 educational institutions), 226,000 species and 5,600 diseases. The table 1 gives an overview over the full DBpedia dataset. TABLE I. CONTENT OF THE DBPEDIA DATASET DataSet Labels and abstracts Links to images Links to external web pages External links into other RDF datasets Links to Wikipedia categories YAGO categories [26]. Number 12.6 million unique things in different languages 24.6 million 27.6 million 45.0 million 67.0 million 41.2 million The dataset consists of 2.46 billion pieces of information (RDF triples) out of which 470 million were extracted from the English edition of Wikipedia, 1.98 billion were extracted from other language editions, and about 45 million are links to external datasets. III. PROPOSED APPROACH This section describes the methodologies and tools for the extraction, modelling and publishing LOD for the DBpedia. The methods are based on the standards of the W3C ( The W3C is the reference point for the Semantic Web in general, and especially for LOD. The general aim of this work is to make a kind of mixture between GIS and the LOD, so that any user can use GIS software to search: a city, a country or a village. Due to time constraints, we used two files the DBpedia namely: person and organization. For desktop GIS we chose the Google Map API because it offers the services to: provide to the end user simplicity, immediacy, accessibility and it allows the user to navigate through a friendly interface. In the following we describe the various stages for semantic interoperability methodology and tools used for the extraction of data from the DBpedia, and for their modelling and publishing through LOD (Fig. 1). The whole methodology can be summarised in the following main steps: Geographical visualization (stage A) Information extraction by SPARQL from DBpedia (stage B) Viewing results (stage C) Figure 1 shows an overview of the functioning of our proposal that consists, on the one hand, to use the concepts of GIS and, secondly, using the principles of LOD. A user views a city by Google Map API, from a given location, Javascript sends the reference of this city to SPARQL, that latter extracts the required data from DBpedia, the results will be displayed in a well-defined format. 27 IT4OD 2014

34 . Information about: persons, organizations,... Name of: country, town, city, village 6 JavaScript SPARQL Client 5 SPARQL Endpoint 4 1 DBpedia 2 3 We used also the mapping service provided by Google, because this service is considered to provide to the end user simplicity, immediacy, accessibility, responsiveness [27, 28], those services are : directions, distance matrix, elevation, geocoding, maximum zoom imagery, street view. B. Information extraction by SPARQL from DBpedia SPARQL (SPARQL Protocol and RDF Query Language) was created by the working group DAWG (RDF Data Access Working Group) of the W3C (World Wide Web Consortium). It is a query language and a protocol that allows to search, add, edit or delete RDF data available through the Internet [29, 30]. It enables to recuperate information from servers SPARQL (SPARQL Endpoint). SPARQL can query DBpedia through the website To extract or publish data from a specific location (Fig. 2), we hope to propose a methodology of extraction, namely: when the user moves over a city or country info-windows with the name of the location and their capital city are visualized. By clicking on an icon, detailed information (famous person and organisation) about this location are shown. Figure 3 shows a SPARQL source code that extracts from the file "person» of DBpedia: name, date of birth and date of death of famous personalities of Souk Ahras, these personalities will be ordered by "name". Fig. 1. System architecture. A. Geographical visualization The geographical visualization shows the name of city as layers with icons, we have used Google Maps API v.3 (Fig. 1). The map is cantered and zoomed to include all the locations retrieved. Fig. 3. Example of SPARQL code enabling to extract Celebrity person in Souk Ahras. Fig. 2. Example of geo-localisation. We propose also to use JavaScript to recuperate the name of location (city) and transform this parameter to SPARQL (Fig. 4). JavaScript is a scripting language, created for making htmlpages live. It turns the web into something more powerful than just interlinked html pages. There are JavaScript libraries out there for SPARQL, but it's actually quite simple to query SPARQL from JavaScript without using any special library. Here is an example of making a S PARQL query directly from a web page using JavaScript. 28 IT4OD 2014

35 problem is that we must know the function and the role of all components of such system. Another interesting propriety is the ability to change DBpedia by another database without influencing the system. Finally to give more relevance to this work, we hope to continue and program the rest of this proposal. Fig. 4. The source code for the OnClick event in JavaScript. Finally, SPARQL start to search in the DBpedia dataset and extract the necessary information about any location. C. Viewing results At the end of the last stage, SPARQL extracted the results from DBpedia. To see those results of research we can use Data Table plug-in for JQuery. Data Table is a highly flexible tool, based upon the foundations of progressive enhancement, and will add advanced interaction controls to any HTML table. jquery 4 is a fast, small, and feature-rich JavaScript library. It makes things like HTML document traversal and manipulation, event handling, animation. IV. DISCUSSION The main research contribution of this work is to enable connection between two domains, namely, Geographical Information System, and Linked Open Data in order to benefit from their enormous potential. We have presented several technologies, which together show how simple maps can be enriched using background knowledge from the Linked Open Data datasets provided by DBpedia. The proposed solution is in the process of final implementation using several frameworks such as: SPARQL, XML, JavaScript and jquery. We have used Google Map API as a tool of visualization; also, we have chosen DBpedia as database to extract the required information. This work gives other ideas to solve the problem of data heterogeneity, for example we use standard structure and format of files of DBpedia dataset (e.g. species, diseases) to extract the different diseases of cities or to extract the set of species. One of the application scenario of the proposed approach is to use linked open data within the current version of StreetOpendMap in order to create a new version that will take into account not only the traditional browsing aspect but also the modern and interactive browsing, which is based on semantic Web methodologies. Our solution is generic and can use GeoSPARQL to replace the existing SPARQL and PHP to replace JavaScript. It can also use another GIS to visualize different layer of data; but the V. CONCLUSION We have seen in the first part the classification of different GIS are used in the fields of research and industry, due to the huge number of GIS we cannot control all these tools, therefore we chose Google Maps API, because it can provide some friendly services to users. We have also seen the basic notions of semantic web (Web of data) e.g. LOD and DBpedia. The general aim of this research is to make a m ixture between domains GIS and LOD. To realize our prototype, we have proposed to use some programming tools, namely JavaScript, SPARQL and among GIS, we chose Google Maps API v3. If a user finds a particular city, when he clicks on this city, JavaScript gets his name and send a query to SPARQL. The later begins to search in the DBpedia database and extract the necessary information about this location. Finally SPARQL returns the results in a few forma (html, pdf, xls,...) for precise statistics. To give more relevance to this work, we hope to continue and program the rest of this proposal. The current developed prototype, integrating Google Maps APIs and Sparql framework, is considered as the first result of this purpose. We can also use GeoSPARQL for representation and querying of geospatial linked data for the Semantic Web from the Open Geospatial Consortium (OGC). REFERENCES [1] F. Bauer, and M. Kaltenböck, Linked Open Data: The Essentials, Edition mono/monochrom, Vienna, [2] T. Berners-Lee, Y. Chen, L. Chilton, D. Connolly, R. Dhanaraj, J. Hollenbach, A. Lerer and D. Sheets, Tabulator: Exploring and analyzing linked data on the semantic web, In Proceedings of the 3rd International Semantic Web User Interaction Workshop (2006), Athens, USA SWUI [3] C. Bizer, T. Heath and T. Berners-Lee, Linked Data The Story So Far, International Journal on Semantic Web and Information Systems 5, 3, 2009, pp [4] S. Consoli, A. Gangemi, A. G. zzolese, S. Peroni, V. Presutti, D. R. Recupero and D. Spampinato, Geolinked Open Data for the Municipality of Catania, In Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14) (p. 58). ACM, June [5] C. Stadler, J. Lehmann, K. Höffner, and S. Auer, Linkedgeodata: A core for a w eb of spatial open data, Semantic Web, 3(4), 2012, pp [6] M. Zhang, C. He and G. Liu, Application of GIS in electronic commerce, IEEE Second JJTA International Conference on Geoscience and Remote Sensing, 2010, pp [7] M. H. Selamat, M. S. Othman, N. H. M. Shamsuddin, N. I. M. Zukepli and A. F.Hassan, A Review on O pen Source Architecture in Geographical Information Systems, 4 : 29 IT4OD 2014

36 International Conference on C omputer & Information Science (ICCIS), [8] H. Zarzour, T. Abid, and M. Sellami, Conflict-free collaborative decision-making over Mind-Mapping, 4th International Conference on A dvanced Computing & Communication Technologies, ACCT 14, Rohtak, India, February [9] M. R. Luaces, N. R. Brisaboa, J. R. Param'a, and J. R. Viqueira, A Generic Framework for GIS Applications, in Froc. W2GIS, 2004, pp [10] S. Steiniger and R. Weibel, GIS software a description in 1000 words, Encyclopeadia of Geography, 1-2, [11] M. Neteler, and H. Mitasova, A GRASS GIS approach, Open source GIS: (3rd ed.), Berlin: Springer, [12] M. Hugentobler, S. Shekhar, and H. Xiong, Quantum GIS In Eds, Encyclopedia of GIS, New York: Springer, 2008, pp [13] C. R, Valenzuela, ILWIS overview, ITC Journal, 1988, pp [14] T. Hengl, S. Gruber and P.D. Shrestha, Digital terrain analysis in ILWIS, International Institute for Geo-Information Science and Earth Observation Enschede, The Netherlands, 62, [15] P. Ramsey, udig desktop application framework, In Presentation conference at FOSS4G [16] O. Conrad, Entwuf, Funktionsumfang und A nwendung eines Systems für Automatisierte Geowissenschaftliche Analysen, PhD thesis, SAGA, University of Göttingen, [17] V. Olaya, A gentle introduction to SAGA GIS, The SAGA User Group ev, Gottingen, Germany, 208, [18] S. Steiniger, and M. Michaud, The Desktop GIS OpenJUMP, A hands-on introduction. OGRS 2009 Workshop At any moment, the user can download the tabular data on t he local computer in a few formats (pdf, html, xls) or just to copy them in the clipboard for further analysis., Nantes, France. Available from: [19] D. P. Ames, C. Michaelis, C. and T. Dunsford, Introducing the MapWindow GIS project, OSGeo Journal, 2(1), [20] A. Anguix, and L. Diaz, gvsig: A GIS desktop solution for an open SDI, Journal of Geography and Regional Planing, 1(3), 2008, pp [21] F. Manola, and E. Miller, Resource description framework (RDF) primer, W3C Recommendation, 10, [22] H. Zarzour and M. Sellami, srce: a co llaborative editing of scalable semantic stores on P2P networks, International Journal of Computer Applications in Technology, vol. 48, no.1, 2013, pp [23] T. Heath and C. Bizer, Linked Data: Evolving the Web into a Global Data Space, Synthesis Lectures on the Semantic Web. Morgan & Claypool Publishers, [24] T. Di Noia, R. Mirizzi, V. Ostuni, D. Romito, Exploiting the web of data in model-based recommender systems, InProceedings of the sixth ACM conference on Recommender systems (pp ). ACM, September [25] C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak and S. Hellmann, DBpedia-A crystallization point for the Web of Data, Web Semantics: science, services and agents on the world wide web, 7(3), 2009, pp [26] F. M. Suchanek G. Kasneci G. Weikum, YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia, Banff, Alberta, Canada, ACM, pp. 8 12, May [27] E. Koua, A. Maceachren, M. Kraak, Evaluating the usability of visualization methods in an exploratory geovisualization environment, Int. Journal of Geographical Information Science, 20(4), 425, [28] E. Koua, M. Kraak, A usability framework for the design and evaluation of an exploratory geovisualization environment, proceedings of the International Conference on I nformation Visualization, , 2004 [29] H. Zarzour, and M. Sellami, B-Set: a synchronization method for distributed semantic stores, IEEE International Conference on Complex Systems, ICCS'12, Published on Xplore IEEE, Agadir, Maroc, November [30] H. Zarzour, and M. Sellami, p2pcosu: A P2P Sparql/update for collaborative authoring of triple-stores, IEEE 11th International Symposium on P rogramming and Systems, ISPS 13, Published on Xplore IEEE, Algiers, Algeria, April IT4OD 2014

37 Using FoCaLiZe to Check OCL constraints on UML classes Messaoud Abbas CPR CEDRIC ENSIIE, Square de la Résistance F Evry. USTHB. LSI, BP32 EL-Alia, Bab Ezzouar, Algiers, Algeria. College of Sciences and Technology, El-Oued University, El-Oued, Algeria. Abstract In this paper we propose a mapping from a subset of OCL constraints specified on UML classes into FoCaLiZe, and use this mapping to check OCL constraints. FoCaLiZe is an object-oriented development environment for certified programming. It is based on a purely functional language, covers all development life cycles and contains all necessary first order and higher order logic constructors to intuitively transform OCL constraints. Also in practice, we can use Zenon, the automated theorem prover of FoCaLiZe to check OCL constraints. In this work, we describe the mapping in such a way that it can be automated and we illustrate our translation with examples. Keywords : UML, OCL, FoCaLiZe, proof, Semantics. 1. Introduction The Unified Modeling Language (UML) [1] is a graphical standard to describe systems in an object oriented way. In order to enhance the UML graphical notations with formal specifications, the Object Constraint Language (OCL) [2] has been introduced. OCL is the current standard of the Object Management Group (OMG) [3]. An OCL constraint is a precise textuel statement which may be attached to any UML element. An important use of OCL expressions is the description of constraints on the UML model, in particular, invariants on UML classes, pre and postconditions of class operations. In the last decade several approaches have been carried out in order to provide supporting tools to check OCL constraints. Among these approaches, those interested in the translation of OCL expressions into formal languages. Some attempts like [4], [5], [6] and UML2B [7] present a mapping of UML/OCL models into B language and use formal proofs available in B to analyse and check OCL constraints. In UMLtoCSP [8], we find a mapping of OCL constraints into constraint programming expressions to check UML class diagrams annotated with OCL constraints. Equational logic, first-order logic and higher-order logic are also considered, ITP-OCL [9], [10] shows an initiative to map OCL expressions in order to support automated evaluation of OCL expressions using termrewriting, KeY [11] propose a mapping from OCL into first-order logic to permit interactive reasoning about UML diagrams with OCL constraints and HOL- OCL [12], [13] maps OCL constraints into higher-order logic. Also [14] propose a mapping from a subset of OCL into first-order logic to check the unsatisfiability of OCL constraints, and [15] proposes a translation of UML/OCL model into the Maude system. Finally, the Alloy system is recently used in works such as [16] and [17] to check the consistency of UML/OCL models. An OCL constraint is attached to a UML classifier (its context), in particular a UML class. To get a perfect transformation of OCL constraints, we need first a correct transformation of the UML classes hierarchy into the target formal language. Then, the OCL constraints may be converted into logical formulas using the derived specification. However, we noticed that in all the aforementioned works the transformation of UML classes is indirectly manufactured, because the lack of UML conceptual and architecture features in the supported formal tools. For example, all the above formal tools do not support features such as multiple inheritance, function redefinition, late binding, template classes and template binding. This malfunction and large gap between UML and these formal tools leads to lose UML classes semantics and then to lose OCL constraints semantics. In this paper we propose a formal transformation from UML classes annotated with OCL constraints into the FoCaLiZe environment [18]. The choice of FoCaLiZe does not solely rely on its formal aspects. FoCaLiZe supports most of the requirements mandated by standards upon the assessment of a software development life cycle [19]. More precisely, our choice is 31 IT4OD 2014

38 motivated by the three following arguments: First, FoCaLiZe supports most of UML conceptual and architectural features such as encapsulation, inheritance (generalization/specialization) and multiple inheritance, function redefinition, late-binding, dependency, UML template classes and template bindings through its own constructs without additional structures or invariants. The second motivation of FoCaLiZe lies in the paradigm of its language. The FoCaLiZe language is a functional language (very close to usual ML family languages), which avoids side effects during program execution. Finally, the use of the FoCaLiZe environment is also motivated by the availability of its automated theorem prover Zenon [20] and its proof checker Coq [21]. Realizing proofs with Zenon makes the user intervention much easier since it manages to fulfill most of the proof obligations automatically. In addition, whenever such a proof fails, Zenon helps the user to locate the source of the inconsistency. At the last step, Coq validates the proof. The FoCaLiZe conceptual and architectural features allow us first to naturally transform the UML classes (on which the OCL constraints are specified) into FoCaLiZe hierarchy that matches perfectly the UML classes hierarchy and preserves its semantics, and then to map the OCL constraints (class invariant, precondition and post-condition) into FoCaLiZe logical formulas. Later on, a FoCaLiZe user can prove the derived properties using Zenon and automatically certify these proofs by Coq. Also using FoCaLiZe, we find it very intuitive to convert the OCL which is rarely considered or hardly converted in the aforementioned works. The current document is organized as follows : section 2 presents FoCaLiZe and UML/OCL concept, section 3 and 4 describe the mapping from UML classes and OCL constraints into FoCaLiZe. In section 5 we present a framework to prove OCL constraints using FoCaLiZe proof language. 2. FoCaLiZe and UML/OCL concepts 2.1. FoCaLiZe The FoCaLiZe [18] environment, initiated by T. Hardin and R. Rioboo, is an integrated development environment with formal features. FoCaLiZe is a purely functional language where data is abstracted trough collections which are described by species. A species groups together function signatures which are introduced by the keyword signature, function definitions which are introduced by the keyword let, logical statements which are introduced by the keyword property and statements together with their proofs which are introduced by the keyword theorem. Functions and properties operate on a representation (carrier) type which is introduced by the keyword representation. These values are then abstracted and denoted by the keyword Self within a species. A collection is derived from a species implementing all functions and proving all statements. The general syntax of a species is given as follows: species species name = [representation = rep type;] signature function name : function type; [local] let function name = function body; property property name : specification ; theorem theorem name : specification proof = theorem proof ; end ;; Species have objects flavors and can inherit [22] from other species. A species can also be parameterized either by collections (of a specified species) or by entities of a collection. The general form of a species declaration is: species species name (collection name is species app, entity name in collection name,... ) = inherit species app,... ;... end;; where species app is the application of a defined species over its parameters. The syntax to define collections is: collection collection name = implement species app ; end;; A particular species in the FoCaLiZe library is the species Setoid. It is a general species representing the data structure of entities belonging to a decidable non empty set, which can be submitted to an equality test: species Setoid = inherit Basic_object; signature equal: Self->Self->bool; signature element: Self; property equal_reflexive: all x: Self, equal (x, x) ; property equal_symmetric: all x y: Self, equal(x, y) -> equal (y, x) ; property equal_transitive: all x y z:self, equal(x, y)-> equal(y, z) -> equal(x, z);... end;; 2.2. UML classes and OCL constraints An UML class is characterized by its name and contains attributes and operations. Each attribute (instance or state variable) of a class has a name and a specification which is either a primitive type (Integer, Real, UnlimitedNatural, Boolean, String) or another class of the model. Specifications may describe multiple values of a type. Each operation (method) has 32 IT4OD 2014

39 a name, parameters and a specification. Parameters are pairs formed by names and their specifications. The Object Constraint Language (OCL) [2] allows us to enhance UML graphical notations with logical formulas. It is the current standard of the Object Management Group (OMG) [3]. An OCL constraint may be attached to any UML element (its context) to further specify it. In particular, an OCL constraint is attached to an UML class. We consider class invariants and pre and post-conditions of class operations. OCL constraints will be described in more detail in section From UML classes into FoCaLiZe Since most of the UML design features can seamlessly be represented in FoCaLiZe [22] [23], we transform an UML class into a FoCaLiZe species. The UML inheritance mechanism is mapped through Fo- CaLiZe inheritance, Function redefinition in UML is modeled by function redefinition in FoCaLiZe, UML late binding feature is supported using the same feature in FoCaLiZe, The UML template classes are quite corresponding to parameterized species in FoCaLiZe [24] and UML template bindings are equivalent to parameter substitutions in FoCaLiZe. For brevity, we do not detail here such elements as inheritances, dependencies and instantiations nor associations. The carrier type (representation) of the derived species is a record created from types of the instance variables. Each class attribute is also modeled with a getter function in the corresponding species. UML class operations will be converted into functional signatures of a species. To illustrate class transformation, Table 1 shows the example of the class Person. In this example, the carrier type of the species PersonSpecies is string * (int). It is the cartesian product of types of the attributes age and name. The function newperson is the transformation of the class constructor Semantics and transformation rules In order to provide a formal framework for our mapping, we propose an abstract syntax for UML and OCL considered constructs. During the transformation process we will maintain two contexts, Γ U for UML and Γ F for FoCaLiZe. Γ U is defined as follows: For a given class named c n, Γ U (c n) = (c n = V c n), where V c n is the value of the class. A class value is a pair (Γ c n, body c n) composed of the local context of the class and the body of the class. The body of the class is composed of class attributes and class operations. In other words, the body of the class c n is expressed as follows (the trailing star sign denotes several occurrences): body c n = (Attr c n, Opr c n), where Attr c n = {(att n : att type)}, att n is an attribute name of the class c n and att type its type. Similarly Opr c n = {(op n : op type)} where op n is an operation name of the class c n and op type its parameters and returned types. In a symmetrical way, the FoCaLiZe typing context, is defined as follows: For a given species named s n, Γ F (s n) = (s n = V s n), where V s n is the value of the species. A species value is a pair (Γ s n, body s n) composed of the local context of the species together with its body. A species body is composed of its representation (Rep), signatures (Sig), function definitions (Let), properties (P rop) and proofs (P roof). The body of the species s n is denoted: body s n = (Rep s n, Sig s n, Let s n, P rop s n, P roof s n). For an UML element U we will denote [[U]] ΓU,Γ F its translation into FoCaLiZe. For an identifier ident, upper(ident) returns ident with its first character capitalized and lower(ident) returns ident with its first character in lowercase. The general definition of an UML class is given as follows: < class def > = [public private protected ] [final abstract] [«class-stereotype»] class c n (P) binds T depends D inherits H = A ; O end where c n is the name of the class, P is a list of parameter declarations, T is a list of substitutions of formal parameters by actual parameters, H designates the list of classes from which the current class inherits, D is a list of dependencies, A the attribute list of the class and O its operations (methods). The derived species will be obtained after translating [[P]] ΓU,Γ F, [[T ]] ΓU,Γ F, [[D]] ΓU,Γ F, [[H]] ΓU,Γ F, [[A]] (ΓU Γ c n),γ F and [[O]] (ΓU Γ c n),γ F (the corresponding transformation rule is illustrated in Figure 1). Due to space limit, we will not detail these rules in the current context. These rules enable us to enrich progressively the local context of the current class Γ c n, the local context of the derived species Γ s n and their bodies. 4. From OCL constraints into FoCaLiZe An OCL constraint is an expression of the OCL language [2] which uses types and operations on types. We distinguish between primitive types (Integer, Boolean, Real and String), enumeration types, 33 IT4OD 2014

40 UML Table 1: Transformation of the class Person FoCaLiZe species PersonSpecies = representation = string * (int) ; let get_name (p:self):string = fst(p) ; let get_age (p:self):int = snd(p); signature newperson:string -> int -> Self; signature set_age : Self -> int -> Self ; signature birthdayhappens : Self -> Self; end;; [< class def > ] ΓU,Γ F = species s n ( [P ] ΓU,Γ F, [T ] ΓU,Γ F, [D ] ΓU,Γ F )= inherit Setoid, [H ] ΓU,Γ F ; [A ] ΓU,Γ F ; [O ] ΓU,Γ F end ;; Figure 1: Transformation of a class definition object types (classes of UML model) and collection types. Collection types are Collection(T) and their sub-types Set(T), OrderedSet(T), Bag(T) and Sequence(T) where T is a type. To transform OCL expressions we have built an OCL framework library support. In this library we model primitive types by using of FoCaLiZe primitive types predefined in basics library (int, bool, string and float) for which we defined operations. For collection types, we have a species named OCL_Collection(Obj is Setoid) implementing OCL operations on collections (forall, isempty, size... ). Other kinds of collection (Set(T), OrderedSet(T), Bag(T) and Sequence(T)) are also described by species which are inherited from OCL_Collection Mapping of primitive types The OCL type Integer is modeled by int of FoCaLiZe (basics#int). Table 2 shows the mapping of OCL operations on Integer into FoCaLiZe. In this table, a and b are two OCL integer values and x and y are their semantics in FoCaLiZe. The Boolean, String and Real types are handled similarly using the bool, string and float types of FoCaLiZe. An OCL enumeration type is modeled in FoCaLiZe by an equivalent sum type Mapping of Collection types For a given OCL type T (Integer, Real, UML class,...), the OCL type Collection(T) represents a collection family of elements of type T. The types Set(T), OrderedSet(T), Bag(T) and Sequence(T) are sub-types of Collection(T). The OCL type Collection(T) is modeled by the following species of our library support: Table 2: Mapping of operations on Integer OCL operation FoCaLize semantic a = b x =0x y a <> b (x =0x y) a + b x + y a - b x - y -a -x a * b x * y a div b x / y a mod b x % y a < b x <0x y a <= b x <=0x y a > b x >0x y a >= b x >=0x y a.min(b) min0x(x, y) a.max(b) max0x(x, y) a.abs abs0x(x, y) a / b real_div(x, y) OCL_Collection(Obj is Setoid)=...end;; This species is parametrized by the formal parameter Obj, where (Obj is Setoid). The species Setoid is the general species defined in FoCaLiZe library (see Sec. 2.1). The formal parameter Obj is to be substituted by any other species to model any kind of collections. For example the OCL type Collection(Person) is converted to the species: OCL_Collection(Obj is PersonSpecies), where PersonSpecies (see Table 1) is the species derived from the class Person. The OCL type Collection(T) has several common operations. For brevity, we only present here the specification (signatures + properties) of the following common operations: 34 IT4OD 2014

41 C -> includes(obj:t):boolean: returns true if obj belongs to the collection C. C -> isempty():boolean: returns true if the collection C is empty. C -> size():integer : returns the number of elements of the collection C. T.allInstances : Collection(T) : returns the collection of all instances of the type T. C -> forall(t:t expr-containing-t: Boolean):Boolean: returns true if all elements of C verify expr. C -> exists(expr:boolean):boolean: returns true if exists an element of C verifying expr. The specifications of these operations are given in the species OCL_Collection of our library support as follows : species OCL_Collection(Obj is Setoid) = signature empty_collection : Self ; signature includes Obj -> Self -> bool; signature size : Self -> int; let isempty ( c : Self): bool = (c = empty_collection); (* the empty collection has 0 element*) property empty_collection_spec : size( empty_collection )=0;... end;; Properties stated after each operation signature such as empty_collection_spec will be useful in the proofs of constraints that use operations on collections. For the operations allinstanc, forall and exists, although it is possible to specify them in the species OCL_Collection, we prefer to translate them using FoCaLiZe universal ( ) ans existential ( ) quantification which looks simpler and more practical to us. A transformation example of an OCL constraint containing the allinstanc and forall expressions is presented in table Mapping of OCL constraints All OCL constraints are mapped into FoCaLiZe properties (property or theorem) in a particular species. For example, all the OCL constraints that have the class Person as context, will be transformed and proved in a species that we call Person_constraints. This latter, inherits the species PersonSpecies derived from the class Person. The OCL expressions describing invariants, the preconditions and the post-conditions are then converted into FoCaLiZe equivalent expressions. We have proposed a formal transformation rule for each supported OCL construct. The first example in Table 3 presents the transformation of an invariant on the class Person which specifying that the value of the attribute age is greater or equal to 0 and less or equal to 150. The second example in Table 3 shows another invariant that makes sure that all instances of the class Person have unique names. It uses the allinstances and the forall operations of OCL. The third example in Table 3 presents the transformation of a pre and post-condition associated to the operaration set_age of our class Person. In an OCL post-condition, to refer to the value of an attribute at the start of the operation (just before invocation of the operation), one has to post-fix the attribute-name with the commercial at followed by the keyword pre. for example, when the birthday of one person is happening we can specify the following post-condition of the operation birthdayhappens(): context Person::birthdayHappens() post : age = + 1. Unlike in other formalisms, we find it very easy to map the into FoCaLiZe, since FoCaLiZe distinguishes between the entity which invokes the function and the entity returned by the function. The transformation of the above example is the following: property post_birthdayhappens : all x : Self, get_age(birthdayhappens(x)) = get_age(x) + 1; end ;; 4.4. Semantics and transformation rules During the transformation of OCL constraints into FoCaLiZe, we will maintain a typing context Γ O in addition to our previous contexts Γ U and Γ F. For an OCL element S, we will denote [[S]] ΓU,Γ F,Γ O its transformation into FoCaLiZe. To be brief, we will only present the general transformation of OCL invariants and pre and post-conditions. Let S i be an OCL invariant associated to the class named c n. Its general form is: S i = context c n inv : E inv where E inv is the OCL expression describing the invariant S i. The invariant S i is converted into a FoCaLiZe property as follows: [[S i ]] inv Γ U,Γ F,Γ = O property inv ident : all e : Self, [[E inv ]] exp Γ U,Γ F,Γ O ; where, inv ident is an identifier that we assign for the invariant S i and [[E inv ]] exp Γ U,Γ F,Γ O is the transformation of the OCL expression describing the invariant into Fo- CaLiZe. 35 IT4OD 2014

42 [[S j ]] prepost Γ U,Γ F,Γ = O property pre post ident : all e : Self, all x 1 : [[typeexp 1 ]] ΓU,Γ F,..., all x m : [[typeexp m]] ΓU,Γ F, [[E pre]] exp Γ U,Γ F,Γ O -> [[E post]] exp Γ U,Γ F,Γ O ; where, pre post ident is an identifier that we assign for S j, x 1... x m are bound variables, x 1 = lower(p 1 ),..., x m = lower(p m), [[typeexp i ]] ΓU,Γ F is the transformation of variable types and [[E pre]] exp Γ U,Γ F,Γ O ([[E post]] exp Γ U,Γ F,Γ O ) is the transformation of the OCL expressions describing the pre (post) conditions into FoCaLiZe. Figure 2: Proof Framework Let S j be an OCL pre and post-condition associated to the operation OP of the class named c n. Its general form is S j = context c n :: OP n (p 1 : typeexp 1... p m : typeexp m) : returnt ype pre : E pre post : E post where OP n is the operation name, p 1... p m are the operation parameters, typeexp 1... typeexp m their corresponding types and E pre and E post are the OCL expressions describing the pre and the post conditions. An OCL pre and post-condition is converted into a FoCaLiZe implication (pre-condition postcondition) as follows: 5. A framework for formal proofs In general, we adopt the following proof process (illustrated in Figure 2) to prove the derived properties from OCL constraints: 1) We complete the FoCaLiZe specifications by implementing all related functions 2) We introduce proof indications using FoCaLiZe proof language. 3) We compile the FoCaLiZe source using the command focalizec. This latter invokes the command zvtov to achieve the proofs by Zenon. From an UML/OCL model, an abstract FoCaLiZe specification is generated. Then, a FoCaLiZe user needs only to complete the specification by implementing all derived methods to obtain a complete specification. Finally, when compiling the FoCaLiZe source, proof obligations are generated. OCL Table 3: Example of OCL constraints transformation FoCaLiZe species Person_constraints = inherit PersonSpecies; property invariant_person_1 : all x : Self, (get_age(x) <= 150) /\ (get_age(x) > 0); property pre_post_set_age : all x : Self, all a : int, ( (get_age(x) <= 150) /\ (get_age(x) > 0) /\ (a >= get_age(x) ) ) -> (get_age(set_age(x, a)) = a ) ; (* " " designates the logical negation *) property invariant_person_2: all p1 p2: Self, (equal(p1, p2)) -> (get_name(p1) = get_name(p2)) end ;; 36 IT4OD 2014

43 If a proof fails, the FoCaLiZe compiler indicates the line of code responsible for the error. In this case, The FoCaLiZe developer analyses the source in order to correct and/or complete the UML model, and then restarts the development cycle. There are two main kinds of errors: either Zenon could not find a proof automatically, or there are inconsistencies in the original UML/OCL model. In the first case the developer interaction is needed to give appropriate hints to prove the properties, while in the second case we must go back to the original UML/OCL model to correct and/or complete it. OCL errors are detected when proving properties derived from OCL constraints, since the proof of such properties may use a conflicting properties (derived from other OCL constraints) as axioms. Also, thanks to the direct transformation in our transformation approach (FoCaLiZe and UML share the same features semantics), UML errors such as the violation of UML template semantics, inheritance semantics, the dependency semantics or the specification of incorrect values or types are automatically detected when FoCaLiZe compiler checks the derived source. Proof example. We present here the proof of an invariant of the class Person (presented in table 3). After derivation of the species PersonSpecies (from the class Person), we generate the species Person_constraints which inherits from PersonSpecies and contains theorems and properties derived from the OCL constraints (for brevity, we only focus on the property invariant_person_2): species Person_constraints = inherit PersonSpecies; let get_name(p:self):string = fst(p); let equal (x: Self, y : Self): bool = (get_name(x) = get_name(y)) ; theorem invariant_person_2 : all p1 p2 : Self, (equal(p1, p2)) -> (get_name(p1) = get_name(p2)) proof = by definition of equal ;... end ;; The symbols present the logical negation in Fo- CaLiZe. In this example, we ask Zenon to find a proof using the hint by definition of equal. Finally, the compilation of the above FoCaLiZe source ensures the correctness of the specification. If no error has occurred, this means that the compilation, code generation and Coq verification were successful. 6. Conclusion and perspectives In this paper, we presented first, an approach to map UML classes and OCL constraints into a FoCaLiZe specification. A FoCaLiZe species is derived from each UML class. The representation (carrier type) of the derived species is a record that represents the state of the object (its instance variables). Class operations are converted into functions of the derived species. OCL constraints (class invariants, pre-condition and postcondition) correspond to FoCaLiZe properties. UML conceptual and architectural features are supported through similar features in FoCaLiZe. Second, we presented the general approach to prove the derived properties based on Zenon. If Zenon could not find a proof automatically, the FoCaLiZe user has to provide proof hints using FoCaLiZe proof language to help Zenon to achieve the proofs. The presented approach gives rise to a certified software after proving all properties derived from the original OCL constraints. The presented work support most of UML features such as encapsulation, inheritance, late-binding and to the bind relationship which permit to derive a formal specification expressed through a species hierarchy that matches the original UML model. Then OCL constraints are naturally converted into properties of the derived species. To implement the presented approach, we propose to use the XMI technology (XML Metadata Interchange) through an UML tool that supports both UML2 constructs and OCL constraints such as the UML2 Eclipse plug-in. We parse the XMI document to translate it into the UML proposed syntax (using an XSLT stylesheet), so that it is possible to apply the transformation rules that we have proposed for each UML construct. The correctness of the transformation is ensured by the proposed formal rules. As future work, we will extend our mapping to deal with larger subsets of OCL. In particular, we will deal with the sub-types of the general collection type Collection(T) which are Set(T), OrderedSet(T), Bag(T) and Sequence(T). In addition to the common operations of the species OCL_Collection, each sub-type has its own operations. We shall also deal with the operations of conversions from one sub-type to another such as assequence, asset and asorderedset. References [1] OMG.: UML : Superstructure, version 2.4. (January 2011) available at: 4/Infrastructure. 37 IT4OD 2014

44 [2] OMG: OCL : Object constraint language (January 2012) available at: OCL. [3] OMG: Object management group. org/. [4] Marcano, R., Levy, N.: Using B formal specifications for analysis and verification of UML/OCL models. In: Workshop on consistency problems in UMLbased software development. 5th International Conference on the Unified Modeling Language, Citeseer (2002) [5] Ledang, H., Souquières, J., Charles, S.: Un outil de transformation systématique de spécification UML en B. AFDL (2003) [6] Truong, N., Souquières, J., et al.: Validation des propriétés d un scénario UML/OCL à partir de sa dérivation en B. Proc. Approches Formelles dans l Assistance au Développement de Logiciels, France (2004) [7] Hazem, L., Levy, N., Marcano-Kamenoff, R.: UML2B : Un outil pour la génération de modèles formels. AFDL (2004) [8] Cabot, J., Clarisó, R., Riera, D.: Umltocsp: a tool for the formal verification of UML/OCL models using constraint programming. In: Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, ACM (2007) [9] Clavel, M., Egea, M.: ITP/OCL: A rewriting-based validation tool for UML+ OCL static class diagrams. Algebraic Methodology and Software Technology (2006) [10] Clavel, M., Egea, M.: The ITP/OCL tool. (2008) [11] Beckert, B., Hähnle, R., Schmitt, P.: Verification of object-oriented software: The KeY approach. Springer-Verlag (2007) [12] Brucker, A.: An interactive proof environment for object-oriented specifications, phd thesis. ETH, Zurich (2007) [13] Brucker, A., Wolff, B.: The HOL-OCL tool. (2007) [14] Clavel, M., Egea, M., García de Dios, M.: Checking unsatisfiability for ocl constraints. Electronic Communications of the EASST 24 (2010) [15] Durán, F., Gogolla, M., Roldán, M.: Tracing properties of UML and OCL models with maude. arxiv preprint arxiv: (2011) [16] Cunha, A., Garis, A., Riesco, D.: Translating between Alloy Specifications and UML Class Diagrams Annotated with OCL Constraints. Software & Systems Modeling (2013) 1 21 [17] Anastasakis, K., Bordbar, B., Georg, G., Ray, I.: UML2Alloy: A Challenging Model Transformation. In: Model Driven Engineering Languages and Systems. Springer (2007) [18] Hardin, T., Francois, P., Pierre, W., Damien, D.: FoCaLiZe : Tutorial and Reference Manual, version CNAM/INRIA/LIP6 (2012) available at: http: // [19] Ayrault, P., Hardin, T., Pessaux, F.: Development Life-Cycle of Critical Software under FoCal. Electronic Notes in Theoretical Computer Science 243 (2009) [20] Doligez, D.: The Zenon Tool. Software and Documentations freely available at zenon/. [21] coq.: The coq proof assistant, Tutorial and reference manual. INRIA LIP LRI LIX PPS (2010) Distribution available at: [22] Fechter, S.: Sémantique des traits orientés objet de Focal. PhD thesis, Université PARIS 6 (Juillet 2005) [23] Delahaye, D., Étienne, J., Donzeau-Gouge, V.: Producing UML models from Focal specifications: an application to airport security regulations. In: Theoretical Aspects of Software Engineering, TASE 08. 2nd IFIP/IEEE International Symposium on, IEEE (2008) [24] Abbas, M., Ben-Yelles, C.B., Rioboo, R.: Modeling UML Template Classes with FoCaLiZe. In: Integrated Formal Methods. Volume 56., Springer (2014) [25] Dubois, C., Hardin, T., Donzeau-Gouge, V.: Building certified components within FOCAL. Trends in Functional Programming 5 (2006) IT4OD 2014

45 Design of Smart Grid System in the South of Algeria Dr. Benahmed Khelifa Dept. of Science, University of Bechar Bechar, Algeria Douli Amel Dept. of Science, University of Bechar Bechar, Algeria Abstract The increasing complexity of the classical grid due to population growth, the promotion of technology and infrastructure are factors that greatly contribute to instability, insecurity and inefficiency in the use of electrical energy. To overcome the energy sustainability of the environment it is better to use renewable energy for sustainable and economical electricity. Smart grids promise to improve the performance of the grid system. The energy sector has adopted smart grids that use information technology and communication, which can make more reliable and more efficient power systems. South of Algeria meets all the conditions; there are large areas, renewable energy, and wireless communication, which are all essential for the smart grid system. So in this paper a new smart grid system will be proposed for the south of Algeria, based on renewable energy, wireless communication and smart technologies. Index Terms Smart-Grid; Renewable Energy; Smart Technologies; Wireless sensor networks. I. INTRODUCTION One of the basic challenges of the electrical system is being implemented in a real power dem and which should be absolutely reliable. Historically this challenge led to a power system based on highly controllable supply to match a largely uncontrolled demand. However, global climate change and population growth in recent decades have gene rated an increasing demand for abundant, clean and sustainable electrical energy in the world. Today, in most countries, the growing demand for energy means a very heavy weight on the electricity infrastructure already too old and fra gile, for example In Al geria, conditions such as heat waves create electrical demand that the current fragile grid cannot support. Thus, the increasing complexity of the cla ssical power grids, growing demand, requirements for greater reliability, safety, efficiency, environmental issues and s ustainable energy, require the information and communication technologies. This leap toward a s marter grid is widely referred to as smar t grid (SG) The basic concept of SG i s to add m onitoring, analysis, control, and communication capabilities to the existing power grids. Algeria is endowed with various types of renewable energies, namely solar energy, wind energy, hydro energy and biomass energy [1], it also me ets many conditions such as largest areas, wireless co mmunication techniques (3G networks, GPR, etc.), it has also competent researchers in this regard and ability to finance the smart technologies which are all necessary for smart grid system. So in this paper, a solution for integration of a smart electricity network is proposed, using advanced technologies to monitor and manage the transmission of data information and electricity, which means a smart grid. This smart grid integration co-ordinate the roles of all domains to operate all parts of t he system as efficiently as possible, minimizing costs and environmental impacts while maximizing system reliability, resilience and stability. The rest of t his paper i s organized as fol lows: Related works of smart grid in Section II. Brief review status in Algeria in section III. The application of renewable energies in Algeria in Section IV. Then we describe our smart grid approach in section VI. Finally, we summarize the paper. Acronyms in smart grid AMI AMR DMS DR EMS ESI GPS IED MDMS II. ACRONYMS TABLE I. ACRONYMS Acronym Definition Automatic Metering Infrastructure Automatic Meter Reading Distribution Management System Demand Response Energy Management System Energy Services Interface Global Positioning System Intelligent Electronic Device Metering Data Management System III. RELATED WORKS The development of smart grids has attracted considerable interest from fields as divers e as econom ics, sociology and electrical engineering [2]. Clastr es [3] provided a prelim inary overview of possible solutions to encourage the emergence of new smart grid technologies. Nair and Zhang [4] reviewed the development of electricity grids in New Zealand and identified future opportunities and policies concerning intelligent grids. Mah et al. [5] examined th e motivations, processes and outcomes of t he development of smart grids i n South Korea through the perspectives of g overnance and i nnovation 39 IT4OD 2014

46 systems. Ipakchi [6] presente d some emerging smart grid applications and di scussed information system requirements for a broad-base application to the smart grid. Zio and Aven [7] looked into the future world of sm art grids from a rat her different perspective of uncertainties and the related risks and vulnerabilities. Smart grid technologies, including power generation, transmission, transformation, distribution, storage, and consumption, the focus of smart grid development, have been widely discussed. Järventausta et al. [8 ] investigated the general aspects of sm art grids and foc used on sm art grid features at a distribution level, such as the interconnection of distributed generation and active distribution management, using automated meter reading (AMR) systems in network management and power quality monitoring, application of power electronics in electricity distribution, plug-in vehicles as part of sm art grids, and frequency based load control as examples of int eractive customer gateways. Blasques and Pinho [9] proposed a demand-side management model integrated with a metering system for hybrid renewable energy systems in a micro-grid configuration. In Depuru and al. [10], they discussed various features and technologies that could be integrated with a sm art meter, outlined various issues and challenges involved in the design, deployment, utilization, and maintenance of the smart meter infrastructure, and ex plained the importance of i ntroducing smart meters in developing countries. The status of sm art metering in various countries was also illustrated. Wissner [11] gave an o verview of the options that information and communication technology (ICT) offered for the restructuring and modernization of the German power system. IV. BRIEF REVIEW OF THE CURRENT SITUATION Algeria is amongst the top five countries in the world for producing natural gas and in the top ten for oil production [12]. For a very long time now t hese have been t he main export revenue for the country, c urrently accounting for more than 95% of the country s total export revenues [13]. Unfortunately, however, this has made the country heavily reliant on hydrocarbons with 94% of energy currently coming from natural gas which also represents approximately 50% of the country s GDP according to recent reports in the national papers. Latest estimates claim that about 5% of t he country s electricity comes from hydropower plants while only a tiny portion of 0.5-1% is from renewable sources of solar and wind [14]. On the contrary, other countries are already well ahead in adopting renewable energy. Unfortunately, the lack of renewable energy development in Algeria occurs despite the country s favorable geographical location which offers one of the highest solar potentials in the world. Thus, Algeria faces a real challenge between its depe ndence on fossil fuels and its huge potential for exploiting renewable energies. Electrical energy consumption in Algeria will rise to 83 TW hr in 2020 and up to 150 TW hr in 2030 [13]. According to the study of [14] the projections are that th e reserves of oil can only cover the next 50 years while those for natural gas will o nly be available over the next 70 years. Luckily, in the last few years there seem to be some ambitious plans to develop renewable energy over the period by aspiring to generate 40 % of local electricity production by 2030 from solar and wi nd [14]. Within this plan the estimated installed capacity is 22,000 Megawatts (MW): 12,000 MW for domestic market and 10,000 MW for possible export. The initial budget is about $60 US billion. TABLE II. PROGRAM OF PRODUCTION OF ELECTRICITY BASED ON RENEWABLE ENERGY IN THE NATIONAL MARKET FOR ALGERIA[14]. National park (conventional and renewable) MW Renewable share in National parc (MW) Renewable share in National parc (%) National electricity generation (TWh) National generation EnR (TWh) Nationa share generation EnR (%) Years V. SOURCES OF ENERGY Recent reports e.g. [15] show that energy usage in Algeria is split mainly between three secto rs: in dustrial use (24%), transport (33%), and residential and services (43%).In terms of the potential contribution of renewable energy in Algeria and possible engineering applications and solutions it is important to consider the following sources: a) Solar energy: Alg eria resides within the so called solar belt of the world. To understand the true potential of solar energy, it is estimated that 6 hours of solar power from deserts around the world can meet the annual global energy demands. The potential use of solar energy in Algeria may be split into two categories: solar thermal and phot ovoltaic (or PV). W ith solar thermal technologies, about 169,440 TWhr/year may be harnessed [16]. b) Wind energy: Indeed, recent studies show that wind energy is the second m ost important renewable resource in Algeria [17, 13, 14]. The same studies showed that the strongest winds are l ocated in the southwest regions of the country, particularly near Adrar, which is why Algeria expects to have its first wind farm in in this region operational between 2014 and 2015 and generating around 10MW. c) Geothermal Energy: In terms of hy drothermal systems, recent estimates ha ve counted more than 200 hot springs in a n umber of regi ons around the country, the most known are located in Guelma, Khenchla, Biskra and Setif [14]. 40 IT4OD 2014

47 So far these hot springs, or Hammamat, have been used exclusively for physical therapy and pleasure. Their commercial viability has yet to be fully explored. These geothermal resources have b oth low grade (low temperature) and high grade (high temperature) heat with more than a third having temperatures above 45 C [14]. d) Biomass energy: Biomass is a general term used to refer to all t ypes of animal and plant material including municipal solid waste, animal and i ndustrial residues, agricultural and forestry crops/residues, and sewage. These materials can be bur ned or digested to produce heat and electricity. e) Shale gas energy: In terms of shale gas, this is seen by many around t he globe as the new e nergy bullet, partly due to its vast availability. The figure I. shows the renewable energies in Algeria: This solution aims to satisfy consumers in a reliab le, a secure and economical way. It has the following goals: 1. This two-way communication enables consumers to better control their power consumption and offers more choices to the customer. 2. Allow the electrical system to be resistant to the failure, using the intelligent systems. 3. Improve the reliability of the system. 4. The customer may actually b e an energy supplier in place of consumer of energy. A. Our study area location The town of Bechar situated at the foot of the southern side of the Saharan Atlas for a distance of 950 km south-west from the Algeria s capital (Figure2), It is bordered to the north by the northern massifs (Djebel Antar 1960m) and Je bel Horriet (1461m)) and Hamada Oum Sbaâ, sout h by Chabket Mennouna, to the east by Djebel Bechar (1500m), and the west region Kenadsa. The city of Bechar extends over an area of 5050 km ², it i s rich in renewable energy especially s olar energy. The various energy scenarios involving Algeria as a core for generating renewable energy, its environment especially European which studies the World Bank says that he will have in the coming years to import its energy from the great Sahara, where does the German initiative to export solar energy from the great Sahara to Europe [1].. Fig. 1. Renewable energies in Algeria. VI. OUR PROPOSED SMART GRID APPROACH The growth of electricity demand causes an im balance in the present grid system resulting from various causes, such as load shedding, voltage unbalance factor etc. which ultimately affect consumers. As we mentioned above, our country meets all the neces sary conditions to accelerate the development of t he current grid. Smart grids promise to improve the performance of conve ntional electricity system using the information and co mmunication technology and renewable energy which can make the power su pply more reliable and efficient. So, in this regard, we proposed a smart grid architecture that will b e useful to apply in the south of Algeria. Fig. 2. Our study area location (Bechar, Algeria). B. Smart Grid architecture The Figure 3 shows t he simplified structure of t he power system in the way it is presently organized. The electric power i s conventionally accomplished by the renewable energy (solar panels, wind turbines etc.) while the transmission system already includes high-voltage systems from the established power grid. But in time of scare renewable resources, when electricity is generated by the use of resources such as oil, coal, nuclear fission we have to store electricity to manage the variability of ren ewable resources as a resul t we use bulk generation. The field of bulk generation is related to the field of the trans mission. It also co mmunicates through a service interface internet and the area of the operation domain over the WAN with the electric utility office, which balance between the supply and t he demand for electricity. The 41 IT4OD 2014

48 generated electricity is tran smitted to the distribution domain via multiple substations and transmission lines. The dispatch of electricity to end users in the customer domain is implemented by making use of t he wired and wireless communication infrastructures that connect the transmission and custom er domains. Fig. 3. Smart grid architecture. To obtain information about the power system activities like monitoring, control, fault management, maintenance, analysis and m etering, the operation domain use hom e area networks(han) and wide area networks (WAN) in the transmission and di stribution domains by using SCADA systems. Electricity is p rovided to customers and utilities through service providers. They manage services like billing customer account management for ut ility companies. It communicates with the operating domain to get the metering information and with control service for situational awareness in order to provide smart services like management of energy uses, for instance service provi der informs the custom er to reduce the use of som e smart devices which e xceeds the threshold of electric consumption in one hand. In the other hand, it manages the generation energy in the nano grid through a smart meter as a result the customer able to consume, generate or store electricity. In the last, our aim is to satisfy the customer s demands and ne eds with intelligent and rel iable way. In this paper, we p resent our conceptual model which focuses on the essential components to ensure a go od established of smart grid in our study area (south of Algeria). C. System components This system consists of many functional blocks, which are namely, renewable energy, bulk generation, transmission domain, distribution domain, operation domain, nanogrid, microgrid and service provider. a. Renewable energy: The particularly Algeria and t he Maghreb countries have a hi gh solar potential. The sol ar irradiation level made by satellites, by the German Space Agency (DLR) show exceptional sunshine order kwh/m2/an 1200 in Northern Grand Sahara levels. For the cons, the best rate of solar radiation in Europe is of the order of 800 limited to the southern part of E urope kwh/m2/an. Following an evaluation by satellites, th e German Space Agency (ASA) concluded that Algeria is th e most important of all th e Mediterranean solar potential, namely: TWh / year for solar thermal, 13.9 TWh / year for the solar photovoltaic and 35 TWh / year for wind power [18]. b. Bulk generation: This domain can store electricity for managing variability in renewable resources such as ex cess electricity generated at the time of the rich resources m ay be stored for redistribution in periods of scarcity of resources. c. Transmission: The electricity generated is transmitted to the field of di stribution through several sub-stations and transmission lines. The field of transmission can also support energy production and sm all-scale storage. It also treated the bidirectional communications between control centers and substations. d. Distribution: The distribution domain takes t he responsibility to provide electricity to consumers based on user requests and the availability of energy. The stability of this domain is monitored and controlled in order to provide quality electricity. e. Service provider: Electricity is provided to customers and utilities through service providers, it also provides smart services like management of ener gy uses and home energy generation. f. Operation: Hosting power system control operation in the respective domain, e.g. d istribution management systems (DMS), energy management systems (EMS) in generation and transmission systems, micro grid management systems, virtual power plant management systems (aggregating several DER), electric vehicle (EV) fleet charging management systems. g. Customer: This domain includes homes, commercial or industrial buildings. It is electrically connected to the distribution domain and communicates with the distribution, operation, service provider and electric utility office domains. Customers consume, generate (using DERs), or store electricity. h. Micro-grid: The micro grid illustrates in the figure.4 is an electrical system that includes multiples nano grid and distributed energy resources a nd control system, it can be operated in parallel with the other domains using wireless 42 IT4OD 2014

49 communication that transm it data information via the smart meters and cel lular networks and t he wired communication which represent electrical transmission. i. Nano-grid: The modern electricity network will increasingly rely upon a set of intelligent communication and control technologies like home area networ k (nano-grid). Within the smart grid network, Home Area Netwo rk (HAN) helps develop the demand response and demand side management. Home area net works can be implemented via both wired and wireless solutions, using multiple different standards, and can be rem otely controlled and m onitored through a gat eway to neighbor, wide area or sm art grid networks. The smart home works together with the smart meter and the smart grid-it is th e fundamental unit of consumer interaction with the utility. A home area network is a dedicated network connecting devices in the home such as displays, load control devices and ul timately smart appliances seamlessly into the overall smart metering system. It also contains turnkey reference designs of systems to monitor and control these networks. Most of our hi gh energy use today comes from heating/cooling, cooking, lighting, washing and drying. These home appliances are beginning to become smarter with connectivity features that allow them to be automated in order to reap benefits that smart metering and variable tariffs bring. In a home area net work, multiple components interact to provide a wide range of capability. The basic components of a home area network are: The network portal or gateway that connects one or more outside information services to the home area network. The access point or network nodes that form the wired or wireless network itself. The network operating system and net work management software. The end points such as therm ostats, meters, in hom e display devices, and appliances. Smart meter: The smart meter is one of the important components of smart grids in the distribution system. The older generations of sm art meters were one s that allowed AMR. This has now evolved into AMI where the meter not only measures, stores, and communicates loads and other power statistics in real ti me, but it can also be a point of control (through connect/disconnect capabilities) and signaling to consumers and their devices for load control [19]. Smart meters can read automatically, without sending a meter reader out once a m onth. This can be do ne in several different ways: with a signal that is sent back to the transformer or substation over the power line (power line carrier) and then on to the utility in some other way, by a radio link in the local neighborhood, or by a van that drives around the neighborhood and as ks each m eter to give a n automatic readout, via GSM/data radio links. Wireless sensor communication: The net work topology is the mesh topology, a mesh network is a flexible network consists of a group of nodes, where new nodes can join the group and each node can act as an inde pendent router. The self-healing network feature allows communication signals to find an alternate route via the active nodes, if a node must leave the network. The communication between smart meters and smart appliances and home appears, is very important. Many AMI vendors, such as Itron, Elster, and Landis Gyr, prefer smart meters, that the ZigBee protocol can be integrated into [20]. ZigBee is a wi reless communication technology that is relatively low in power consumption, data rate, complexity and cost of deployment. It is an ideal technology for lightning smart energy monitoring, home automation, and automatic meter reading, etc, it provides utilities to send messages to the home owners, and home owners can reach t he information about their real-time energy consumption. Existing cellular networks can be a good option for communicating between smart meters and the utility and between far nod es. Cellular network solutions also enable smart metering deployments spreading to a wi de area environment. Fig. 4. Nano-grid architecture. Power line communication (PLC): is a technique that uses the existing power lines to transmit high speed (2-3 Mbps) data signals from one device to the other. PLC has been the f rst choice for communication with the electricity meter due to the direct connection with the meter [21] and successful implementations of AMI in urban areas where other solutions struggle to meet the needs of utilities. PLC systems based on t he LV di stribution network have been one of t he research topics for sm art grid applications in China [22]. In a typ ical PLC network, smart meters are 43 IT4OD 2014

50 connected to the data concentr ator through power lines and data is transferred to the data center via cellular network technologies. For example, any electrical device, such as a power line smart transceiver-based meter, can be connected to the power line and used to transmit the metering data to a central location [20]. VII. MOTIVATION AND BENEFITS The smart grid is an integrated approach to transform utilities to a future state. I t requires the coordination of advanced technologies, business processes and peo ple. It will be a gradual transformation of the systems that have served us for many years into a m ore intelligent, more effective and environmentally sensitive network to provide for our future needs. Smart Grids in Algeria have enormous potential to: Renewable integration reducing our nation s dependence on coal. A smart grid gives customers more information on how to manage their electricity use. Increase the efficiency of our power grids. Enabling customers to handle their electricity consumption via the internet. Enable the consumer to participate in the selling of electric power. Ensuring the safety and secu rity of energy efficiency and home entertainment features. Open several scientific researches fields for researchers. VIII. SMART GRID TOPICS RESEARCH This section summarizes only the top priorities among the Smart Grid research topics. Fig. 5. Grid topics research. Smart grid open many research themes such as communication protocol and security and sel f healing to ensure the proper functioning of the network and also improve safety performance and avoiding default on the grid for future complex scenarios, power electronics technologies Open questions are e.g.: what are th e metrics for stability in the future (maybe not frequency and voltage)? How can m ostly self-organized systems be stabilized? What central actors will be needed? Which part of t he system should be f ully selforganized, where is central control needed?, Cloud technology is of great relevance for the electricity sector and associated energy carriers. Priority should be given to the following questions: What are the regulatory options to create and allow for creativity on the market on the one hand and economic stability and security of supply on the other hand? How can the regulatory framework proceed and a dapt in an unce rtain environment? There is a ne ed for further development regarding the volatility of the grid and c orresponding response times of distributed ICT (Information and communications technology) systems. IX. CONCLUSION Algeria is well placed to continue to be a major player in the lucrative market of renewable energies. However, transition to renewable energy will need to be accelerated. It is hoped to use the Algerian renewa ble energies in order to integrate a smart electric system or smart in this paper we try to present the whole system structure which represents integration of smart grid systems with renewable energy in the south of Algeria. Smart grid technology can c ontrol renewable resources to effect changes i n the grid s operating conditions and can provide additional benefits as di stributed generation assets or when installed at the transmission level. So this paper aims to: Increase understanding of t he costs and benefi ts of smart grids. Identify the most important measures to develop the technologies of smart grid system that help meet energy and climate goals. REFERENCES [1] Said.Bentouba «Les énergies renouvelables dans le cadre d 'un développement durable en Al gérie willayas du grand sud exemple» paper university of Bechar, Algeria, 11 and 12 Novembre [2] Coll-Mayor, D.; Paget, M.; Lightner, E. Future intelligent power grids: Analysis of the vision in the Europ ean Union and the United States. Energy Policy, 35, p , [3] Clastres, C. Smart Grids: Another step towards competition, energy security and climate change objectives. Energy Policy, 39, p , [4] S Nair, N.K.C.; Zhang, L. Smart Grid: Future networks for New Zealand power systems incorporating distributed generation. Energy Policy, vol37, p , [5] Mah, D.N.; van der Vleu ten, J.M.; Ip, C.J.; Hills, P.R. Governing the transition of so cio-technical systems: A cas e study of the d evelopment of smart grids in Korea. Energy Policy, vol 45, p , IT4OD 2014

51 [6] Ipakchi, A. I mplementing the Smart Grid: Ent erprise information integration. In Proceedings of Grid Interop F orum, Albuquerque, NM, USA, 7 9 November [7] Zio, E.; Aven, T. Uncertainties in Sm art Grids behavior and modeling: What are the risks and vu lnerabilities? How to analyze them? Energy Policy, vol39,p , [8] Järventausta, P.; Repo, S.; Raut iainen, A.; P artanen, J. Smart Grid power s ystem control in distributed generation environment. Annu. Rev. Control, vol 34, p , [9] Blasques, L.C.M.; Pinho, J.T. Metering systems and demandside management mode ls applied to h ybrid renewable ener gy systems in micro-grid conf iguration. Energy Policy, vol 45, p , [10] Depuru, S.S.S.R.; Wang, L.; Devabhaktuni, V. Smart meters for power grid: Cha llenges, issues, advantages and status. Renew. Sust. Energy Rev. 2011, 15, [11] Wissner, M. The S mart Grid A s aucerful of secrets? Appl. Energy, vol 88, p , [12] IEA Key Energy Statistics, [13] A. B. Stambouli, Alger ian renewable energy assessment: the challenge of sustainability, Energy Policy, Vol. 39 (8), pp , [14] A. B. Stambouli, Z. Kh iat, S. Flazi, Y. K itamura, A review on the renewable energy development in Algeria: Current perspective, energy scenario and sustaina bility issues, Renewable and Sustainable En ergy Reviews, vol. 16, pp , [15] A. B. Stam bouli, Algerian renewable energy assessment: the challenge of sustainability, Energy Policy, Vol. 39 (8), pp , [16] A. B. Stambouli, Z. Kh iat, S. Flazi, Y. K itamura, A review on the renewable energy development in Algeria: Current perspective, energy scenario and sustaina bility issues, Renewable and Sustainable En ergy Reviews, vol. 16, pp , [17] S. Menani, Algeria ren ewable energy program: outlook and applications, Energy Week Conference, Vaasa, Finland, [18] I. A. Z ino, Renewable Energies in Alger ia, Arab Clim ate Resilience initiative: Climate Change Impacts in the Arab Region Towards Sustainable Energy: Resources, Challenges and Opportunities, Manama, Bahrain, October, [19] Y. Himri, S. Hi mri, A. Boudg hene Stambouli, Wind power resources in the south-western region of Alg eria, Renewable and Sustainable Energy Reviews, vol. 14, pp , [20] MARIE LAURE LAMY " effi cacité des pol itiques environnementales d incitation a l adoption de nouv elles techniques le cas des energies renouvelables", Thèse de doctorat GRENOUBLE [21] C.Thurler, Smart grid and automatic meter management: dream or reality?, in 19th Int. C onf. on Electricity Distribution, Vienna, Paper 0653, May [22] V.C. Gungor, D. Sahin, T. Kocak, and S. Ergü t, Smart Grid Communications and Networking, Türk Telekom Technical Report , April, [23] R. P. Lewis, P. Igic and Z. Zhongfu, "Assessment of communication methods for smart electricity metering in the U.K.," in P roc. of IEEE P ES/IAS Conference on Sustainable Alternative Energy (SAE), pp. 1-4, Sept [24] M.Y. Zhai, "Transmission Characteristics of Low-Voltage Distribution Networks in China Under the Smart Gri ds Environment," IEEE Trans. on Power Delivery, vol.26, no.1, pp , Jan IT4OD 2014

52 Topic 2: Security & Access management 46 IT4OD 2014

53 Swarm intelligence algorithms in cryptanalysis of Simple Feistel Ciphers T. Mekhaznia LAMIS Laboratory University of Tebessa, Algéria A. Zidani Department of Computer Sciences Université of Batna, Algeria Abstract Recent cryptosystems constitute a hard task for cryptanalysis algorithms due to the nonlinearity of their structure. This problem can be formulated as NP-Hard. It has long been subject to various attacks; related results remain insufficient especially when handling wide instances due to resources requirement which increase with the size of the problem. On another side, optimization algorithms inspired from swarm intelligence represents a set of approaches characterized by their fast convergence with requirement of limited resources. The purpose of this paper is to provide, and for a first time, a more detailed study about the performance of two swarm intelligence algorithms BAT algorithm and Wolf Pack Algorithm for cryptanalysis of some basic variant of Feistel ciphers within limited computer resources. Experiments were accomplished in order to study the effectiveness of the used algorithms in solving the considered problem. Moreover, a comparison of PSO and AG algorithm establishes this advantage. Keywords Cryptanalysis, Feistel ciphers, BAT algorithm, WPA algorithm. I. INTRODUCTION The cryptanalysis refers to is the art of analyzing encryption information, ciphers and related concepts for the purpose to detect their weakness. This action permits retrieval of encryption significance without necessary knowing the secret data which normally used for decryption such keys or encryption algorithms. This fact presents a hard task against research in data security, especially with the increasing of communication networks in recent years when the security of information becomes obvious. The principle of cryptanalysis lies in use of the correct mathematical tools necessary to provide the right attack [ 1 ] which allows evaluating cryptosystems efficiency and therefore, perform more robust algorithms. The brute force is the most popular attack; it considers all alternatives that conduct to the solution; it s a sure technique but need enough resources and so, has less success in practice. Other alternatives available in literature such linear and differential cryptanalysis, are able to break a wide variety of ciphers, nevertheless, and given their reduced setting, they remain ineffective against modern cryptosystems. Feistel ciphers are algorithms for symmetric key encryption scheme which operate on large blocks of data. It built based on nonlinearity and low autocorrelation. The Data Encryption Standard is a popular Feistel algorithm characterized by its simplicity of implementation, high speed of encryption [2] and resistance against various attacks [3]. Actually, research in this area, are intended to use heuristic algorithms. It may appear an efficiency way to break complex ciphers. Swarm intelligence algorithms are a well-known meta-heuristics that were successfully used to resolve cryptanalysis problems with minimal resources consumption [4]. Many works [5] [6] shown that algorithms based swarm intelligence have a successful potential to handle wide instances and may be adapted to produce an approximate solutions for a large variety of optimization problems. They use an intelligent system that offers an independence of movement of agents and tend to replace the preprogramming and centralized control. In last years, many of such algorithms were emerged [7]. In this paper, two swarm intelligence algorithms BAT algorithm and Wolf Pack Algorithm (WPA) are used, and for the first time, for breaking encryption keys used in some variants of Feistel ciphers. The main reason for using these algorithms is due to their global solution finding property: they use few parameters without any need of initial approximation to unknown parameters. Also, they havebeen successfully applied in a wide range of research application areas. It is proved that it gets better results in a faster and cheaper way compared with other methods. II. RELATED WORKS In recent years, several studies were carried out in this way: [8] showed that PSO is more efficient than GA for cryptanalysis of 4DES. The same algorithm was used by [9] for cryptanalysis of DES, results revealed most bits of used keys. [10] used a hybrid algorithm GSO for attack of DES and AES keys, the experiments were able to break DES key. [11] showed that BPSO is an effective tool for the cryptanalysis of Feistel ciphers. [12] reveals, that PSO may be a powerful tool for cryptanalysis of DES. [13] shows that evolutionary computation may be efficient cryptanalysis but depends of the nature of objective function. [14] has used computational algorithms to resolve many cryptographic 47 IT4OD 2014

54 problems, he proves that the problem formulation is a critical determinant of the performance of computational intelligence methods which constitute the first measure of security cryptosystems. [15] [16] proved that BAT algorithm can be used successfully to resolve a wide category of optimization problems. III. SWARM INTELLIGENCE HEURISTICS An idea introduced by [17], Swarm intelligence heuristics (SI, in short) are a nature inspired metaheuristics which can satisfy at least one of the two goals of information technology research, namely the generation of solutions with maximum benefit by using a minimum of resources, however, no proof of the optimality of the solution can be proved, given that, the research becomes useless, if in a exploration space, a cross between the local and global solution occurs [18]. SI heuristics uses a basic population of individuals which represents candidate solutions. They evolve by a decentralized and self-organized system according to a common rule. This principle allows handling very large space of solutions but no guarantee of an optimal solution is ever found. Nature inspired heuristics, part of swarm intelligence heuristics, are stochastic algorithms which took their inspiration from social comportment of animals living in large communities such bird flocking, fish schooling or ant colonies. They are based on a population of individuals that interact and evolve according to natural instinct. This principle is used to produce algorithms able to resolve complex tasks without centralized control. IV. BAT ALGORITHM BAT Algorithm [19], is a metaheuristic population based approach, inspired from the hunting behavior of bats. In their flying and, in order to avoid obstacles and detect prey or their roosts in dark, Bats emit a bisonar [20] throughout their environment, the echo bounces permit to identify kinds of surrounding objects. Studies [21] show that the loudness of emitted pulse varies from lowness rate with a long duration of sounds when exploring hunt area to loudest with a decreasing duration of sounds when homing toward prey. Similar to other nature inspired algorithms such Ant Colony Optimization, Genetic Algorithms and Particle Swarm Optimization, BAT has been in first, implemented for continuous optimization problems where possible solutions are represented by bat s positions and, subsequently extended to the resolution of discrete problems. The principle of BAT evolution is illustrated by following steps: In a search space R n, and at time t, each bat i has a position x i t and a velocity v i t R n. In their randomly fly with a constant velocity v, each bat i emit a uniformly pulse frequency f min. At the perception of a pry, parameters are adjusted depending on the distance to prey according to following relations: f i = f min + (f max - f min ) (1) v i t+1 =v i t +f i (x i t -x g t ) (2) x i t+1 =x i t +v i t+1 (3) where a random vector distributed in range [0,1], x g the position of best bat of swarm. The algorithm illustrated BAT evolution is as follows: Algorithm 1. BAT Input: Problem dimension (N>2,g), objective function S. Output:S(g): minimizing g Initialization : Generate x i, v i, c i (i=1..n g ) Evaluate S(x i ), (i=1.. N g ) Initialize g {x k / S(x k ) min(s(x i ), i=1.. Ng} While not (stopping criteria) Pick random numbers: [0,1] For each bati do f i = f min + (f max - f min ) with f i {f min,f max } v i v i + f i (x i -g) with v i {v min,v max } x i x i + v i If (S(x i ) < S(g)) g x i increase rate pulse and reduce loudness Endif Endfor EndWhile Report g and S(g). V. WPA ALGORITHM Wolf Pack Algorithm (WPA, in short) [22], is a new well-known algorithm used to approximate solutions to various real-word optimization problems. WPA is a population-based metaheuristic inspired by the social foraging behavior of the jungle predators. It consists basically in making animals (wolves) hunt, find the trail of prey and capture it under the command of a lead wolf. The Wolf Pack include a leader-wolf which is the smartest and the most ferocious wolf, it s the responsible of the control of pack. Its decisions always based upon the surrounding environment: prey, wolves of pack and other predators. The pack is managed by two classes of wolves: scoot and furious. The scoot-wolves move independently in their environment and adjust their direction according to the concentration of prey s smell. When a prey is located, the scoot-wolf will howl and report that to the lead-wolf which evaluate the situation, summon the furious-wolves and move fast toward the howl. After capturing a prey, it distributed in an order from the strong to weak. This fact causes the dead of weak wolves for lack of food. This fact keeps an active and strong pack at any time. The Wolf Pack algorithm is accomplished by following steps: 48 IT4OD 2014

55 In a search space R n, each wolf i represent a basic solution of the problem, has a position. Initially, wolves are randomly distributed in space. At each instant t, the wolf i perform a move from position to positio. The choice of next position is updated according to the following equation: ( Li, Ri ) ( Ri 1, Li 1 fi( Ri 1, ki)) x i t+1 =x i t + x g t - x i t (4) where a random vector distributed in range [-1,1], x g the position of the lead-wolf. After a fixed number of iterations, which corresponds to a scooting phase, the wolf of the best solution became a lead wolf; a certain number of weak wolves (bad solutions) will be deleted and replaced by a new generation of random wolves. The algorithm illustrated WPA movement is as follows: Algorithm 2. WPA Input: Problem dimension (N >2,g), objective function S. Output:S(g): minimizing g Initialization : Generate x i, (i=1..n g ) Evaluate S(x i ), (i=1.. N g ) Initialize g {x k / S(x k ) min(s(x i ), i=1.. Ng} While not (stopping criteria) it 0 While not (Iter scoot <It) Pick random numbers: [-1,1] For each wolfi do x i x i + x g - x i If (S(x i ) < S(g)) g x i Endif Endfor Update It EndWhile Delete worst-wolf w / (S(x i )=max(s(x i ), i=1.. Ng} Generate a new random w EndWhile Report g and S(g). VI. FEISTEL CIPHERS Based on confusion and diffusion principle, the Feistel ciphers is a special class of iterated block ciphers which maps a plaintext to a ciphertext by a sequential rtimes repetition of a nonlinear transformation (called round function). In order to produce a ciphertextc from a plaintext M, Feistel cryptosystem proceeds for each bloc of n-bits of M, r iterations of the round function f i using several keys. Initially, each block is split into two halves L 0 and R 0 of n/2 bits each. At iteration i, the round function is applied to one half using a subkey, the output is exclusive-ored with the other half. The two halves are then swapped as shows in following relation: (5) wheref i (i>0), a nonlinear function usually represented as a substitution boxe (called sboxe). It substitutes an input of n bits size with an output of m bits size (m<n). The advantage of the algorithm is that the encryption and decryption functions are identical. To reverse a round, it is only necessary to apply the same transformation again, which will cancel the changes of the binary operation XOR. Feistel ciphers become the basis for many encryption schemes, among them the DES, which is the most popular one. A. Data encryption standard The most widely used Feistel block cipher cryptosystem is the Data Encryption Standard (DES or 16DES in short) which transform 64-bit input block in a series of steps into similar size output block through a 16-round process. The round function f i uses 8 nonlinear sboxes. Each of them is a substitution which mapping 6 to 4 bits. Detailed description of DES is given in [23] [24] [25]. The strength of DES lie in the nonlinearity induced by the sboxes, it remains an important model for the construction of secure block ciphers. Also, the brute force attack need an average of 2^55 tries which takes high time complexity such that, the resources for search in an acceptable period are not available. Its major weakness is existence of several weak and semi weak keys. The weak key is that, after parity drop operation, consist of all 0, all 1 or half 0 and half 1. B. Four rounded DES Algorithm The Four rounds DES algorithm (4DES in short) [26] has the same properties as DES algorithm. It uses an f k function which merge substitution, combination and logic addition that take place in four rounded with four subkeys by using the same table of expansion and permutation (sboxes) as DES. The 4DES algorithm is less complex than DES but remain difficult to break. C. Simplified DES Algorithm The Simplified DES (SDES in short) is a variant of DES introduced by [27]. It has similar properties and structure as DES with much smaller parameters. The SDES encryption algorithm uses an 8 bit block of plaintext and a 10-bit key as input, and produces an 8-bit block of ciphertext as output. The decryption algorithm takes an 8 bit key of ciphertext and the same 10 bit key as input and produces an 8 bit block of plaintext. The encryption algorithm is performed in five steps: (1) an initial permutation, (2) a complex function f k, where both permutation and substitution operations are performed based on the key input, (3) a permutation of the two halves of the data, (4) again performing the function f k, and (5) inverse initial permutation. The same steps are followed for the decryption operation in a reverse order. The algorithm uses two reduced sboxes (size 4 4) and an expansion/permutation table (size 1 8). 49 IT4OD 2014

56 VII. PROBLEM FORMULATION A. Frequency analysis & cost function The letter frequency in a given text is different from one language to another. In English, for example, the single character (1-gram) the most frequent is E with an occurrence of 12% in a text, followed by T with 8%, while the Z occurs with only 0.05%. Similarly, the group of two characters (2-gram) the most common are TH followed by HE, AN... The triplet (3-gram) the most common is 'THE'. These statistics have been developed based on Corpus [28] [29] and illustrated on tables called letter frequency table [30] [31]. TABLE I. Thomas Tempé [32] FREQUENCY TABLE FOR MOST ENGLISH AND FRENCH CORPUS Concise OD [33] Corpus Français [34] Leipzig Corpora [35] A B C D E F G H I J K L M N O P Q R S T U V W X Y Z The letter frequency analysis is the process of determining at which frequency a letter of the plaintext occurs within the corresponding ciphered text. This fact gives an idea about the language of the ciphertext, and therefore, to recover some of its letters, especially the most frequent. The natural way to prove the effectiveness of a candidate key used in decryption is to compare the letter frequency analysis of the decrypted text to the letter frequency of the language used. The cost function is built upon this idea. It can have different forms which are used on several combination schemes of n-gram analysis. Various alternatives forms of this function were available in literature [36] [37] [38] [39]. The most commonly used is given by following equation: F( k) 26 i 1 D( i) C( i) u 26 i, j 1 D( i, j) C( i, j). whered, C denotes respectively known language statistics and decrypted text statistics. u, b: denotes respectively 1- gram and 2-gram statistics, and (with + =1) are weights assigning different priorities to 1-gram and 2-gram and k, the key used in decryption process. B. Implementation scheme In a search space, each bat/wolf irepresent a random initial solution of the problem that corresponds to a decryption key k, a vector of n bits (10 bits in case of SDES and 56 otherwise); each bit x represents abat/wolf position in their exploration. At each move from position to position, the bat/wolf i perform a decryption key obtained by swapping the bits x t and x t+1 obtained respectively by equations (3) and (4) in case of BAT or WPA algorithms. The performance of each position corresponds to the cost of text obtained using the decryption key according to the cost function mentioned above.. The process will be stopped after a fixed number of iterations or if no improvement in solution. VIII. EXPERIMENTATION AND RESULTS In this section and, in order to outline the performance of the proposed algorithms, some experiments have been conducted on a set of sample binary texts of 1200 to 2400 bits (150 to 300 ASCII characters) extracts from ICE [40]. Encryption methods used are: SDES, 4DES and DES cryptosystems using several keys. Each key is a binary vector of 10 bits (in case of SDES) and 56 bits in other cases. The used algorithm is coded on Matlab 2.14 and performed on a CPU 3.2 Ghz. The results obtained after carrying the mentioned experiments are illustrated in the table below which show the average of recovered bits of encryption key for each algorithm when using a fixed processing time of 150 seconds. TABLE II. Cipher algorithm PERFORMANCE EVALUATION OF BAT AND WPA ALGORITHMS Decryption key-size (bits) Recovered bits BAT WPA SDES DES DES In this first experiment, we show that both algorithms perform the overall number of much character when SDES encryption algorithm is used and an average of 50% for the b (6) 50 IT4OD 2014

57 other cases. Also, WPA algorithm performs better than BAT in the most cases. In order to conduct a comparative study, the same tests were carried on standard PSOand GA algorithms in a similarenvironment,particularly in terms ofdata andprocessing time. The results were illustrated in the following figure. Fig. 1. Comparison of BAT/WPA vs PSO and GA In this experiment, we compare the performance of BAT and WPA against PSO and GA based on cipher algorithms used. The figure 1 shows that the WPA algorithm outperforms significantly as GA algorithm. Also, the performance of both BAT and WPA is particularly noticeable in case of SDES and 4DES. IX. CONCLUSION In this paper, a comparative study of BAT algorithm and WPA algorithm for cryptanalysis of some variants of Feistel ciphers was conducted. The experiments show that the algorithms can be successfully applied in resolve of such problem. The produced results show that both algorithms allow locating the full key (in case of reduced DES) and more than 50% of bits-key (in other cases) with acceptable resource consumption. The tests were operates on a reduced space of data, however, the results presented can be improved by the well choice of problem factors such environment parameters, ciphered data and languages statistics. In addition, the study may open avenues to investigate the effectiveness of other evolutionary computation techniques for further attacks of more complicated cryptosystems. REFERENCES [1] S. Rao& al Cryptanalysis of a Feistal Type Block Cipher by Feed Forward Neural Network Using Right Sigmoidal Signals.International Journal of Software Computing, Vol.4, No.3, [2] S.Ali K, Al-OmariPutra Sumari,2010. Spiking Neurons with ASNN BASED-Methods for the Neural Block Cipher.International journal of computer science& information Technology, Vol.2, No.4. [3] R. Singh, D. B. Ojha, An Ordeal Random Data Encryption Scheme (ORDES).International Journal of Engineering Science and Technology, Vol. 2, No.11, pp [4] A. Gherboudj, S. Chikhi, 2011.A modified HPSO Algorithms for Knapsack Problem.CCIS, Springer. [5] C. Blum, X. Li, 2008.Swarm intelligence in optimization. Natural Computing Series, Springer, [6] T.S.C. Felix, M.K. Tiwari,c2007. Swarm Intelligence, Focus on Ant Particle Swarm Optimization.I. Tech Education and Publishing. [7] G.S. Sharvani, N.K. Cauvery, T.M. Rangaswamy, Different Types of Swarm Intelligence Algorithm for Routing.International Conference on Advances in Recent Technologies in Communication and Computing. [8] Shahzad, W., Siddiqui, A.B.and Khan, F.A Cryptanalysis of Four-Rounded DES using Binary Particle Swarm Optimization. In Proceeding of thegenetic and Evolutionary Computation Conference (July 8-12,2009). ACM, NY, [9] Wafaa, G.A.and al Known-Plaintext Attack of DES-16 Using Particle Swarm Optimization. In Proceeding of The 3rd World Congress on Nature and Biologically Inspired Computing (NaBIC2011), [10] Vimalathithan, R., Valarmathi, M.L Cryptanalysis of DES using Computational Intelligence.European Journal of Scientific Research, 55, 2 (2011), [11] Jadon, S.S. and al Application of Binary Particle Swarm Optimization in Cryptanalysis of DES. In Proceeding of the International Conference on Soft Computing for Problem Solving, (Dec ,2011). Advances in Intelligent and Soft Computing 130, [12] Pandey, S., Mishra, M. Particle Swarm Optimization in Cryptanalysis of DES, Int. J. of Advanced Research in Computer Engineering & Technology 1, 4, [13] Laskari, E.C. and al Applying evolutionary computation methods for cryptanalysis of Feistel ciphers, J. Applied Math. And Comp., 184 (2007), [14] Laskari, E.C. and al. 2007, Cryptography and cryptanalysis through computational intelligence. J. Studies in Computational Intelligence, 57 (2007), [15]X. S. Yang and A. H. Gandomi , Bat algorithm: a novel approach for global engineering optimization, Engineering Computations, Vol. 29, No. 5, pp [16]S. Mishra, K. Shaw, D. Mishra, 2012.A new metaheuristic classification approach for microarray data.procedia Technology, Vol. 4, pp [17] Beni, G., Wang, J. Swarm Intelligence in Cellular Robotic Systems, Proceed. NATO Advanced Workshop on Robots and Biological Systems, Tuscany, Italy, June (1989). [18] J. Olamaei, T. Niknam, G. Gharehpetian, 2008.Application of particle swarm optimization for distribution feeder reconfiguration considering distributed generators.amc. [19] X. S. Yang A New Metaheuristic Bat-Inspired Algorithm.Nature Inspired Cooperative Strategies for Optimization. [20] D. R. Griffin 1958.Listening in the dark.yale Univ. Press, New York. [21] J. R. Speakman, P. A. Racey The cost of being a bat. Nature V 350, P , [22]W. Hu-Sheng and Z.Feng-Ming Wolf Pack Algorithm for Unconstrained Global Optimization. Mathematical Problems in Engineering, V [23] S. Ghorui& al A simpler and elegant algorithm for computing fractal dimension in higher dimensional state space.pramana Journal of Physics, Indian Academy of Sciences 54(2), L331 L336. [24] W. Stallings, 2004.Cryptography and Network Security Principles and Practices, Pearson Education. [25] A. B. Forouzan Cryptography and Network Security. Tata 51 IT4OD 2014

58 McGraw hill Education, 2nd ed.. [26] E.C. Laskari& al Evolutionary computation based cryptanalysis: A first study, Nonlinear Analysis: Theory, Methods and Applications 63 e823 e830. [27] E. Schaefer, 1996.A Simplified Data Encryption Standard Algorithm, Cryptologia, Vol.20, No.1, pp [28] Robert, L Cryptological Mathematics, The Mathematical Association of America, NY. [29] Nelson, G., Wallis, G. and Bas, A Exploring Natural Languag: Working with the British Component of the International Corpus of English.John Benjamins Publishing Company, Amsterdam. [30] Singh, S The code book: The Evolution of Secrecy from Mary, Queen of Scots, to Quantum Cryptography. Doubleday, New York, NY, USA, 1st edition. [31] Beker, H. and Piper, F Cipher Systems: The Protection of Communications. John Wiley & Sons. [32]. / [33] / article / / Letter-frequency-English. [34] [35] [36] Jakobsen, T. and Knudsen, L.R Attacks on block ciphers of low algebraic degree, J. of Cryptology. 14,3, [37] Nalini, N. and Raghavendra, G Cryptanalysis of Simplified Data Encryption Standard via Optimisation Heuristics.Int. J. of Computer Science and Network Security. 6, [38] Verma, A. K., Dave, M. and Joshi. R. C Genetic Algorithm and Tabu Search Attack on the MonoAlphabeticSubsitution Cipher in Adhoc Networks.Journal of Computer Science. 3 (3), [39] Ganapathi, S. and Purusothaman, T Reduction of Key Search Space of Vigenere Cipher Using Particle Swarm Optimization.J. of Computer Science. 7, 11, [40] N. Gerald, W. Sean and A. Bas, 2002.Exploring Natural Language.John Benjamins Publishing Company. 52 IT4OD 2014

59 Secure audio watermarking algorithm: An application for copyright protection Mustapha Hemis and Bachir Boudraa Speech communication and signal processing laboratory University of Sciences and Technology Houari Boumediene (USTHB) BP 32 El Alia, Algiers, Algeria Abstract Digital watermarking plays an important role for copyright protection of multimedia data. In this paper an efficient robust audio watermarking algorithm based on double transforms is presented. In a first step, the original signal is decomposed by Discrete Wavelet Transform (DWT). Then the prominent approximation coefficients are segmented in nonoverlapping 2D blocks. Singular Value Decomposition (SVD) is applied on each one. The watermark is embedded in the Singular Values (SVs) for each block. Watermark extraction is non-blind and it s done by performing inverse operation of embedding process. Experimental results show that this scheme is robust against common signal processing attacks such as noise addition, filtering, MP3 compression and inaudibility is satisfactory. Moreover, this method use a double key insertion making it suitable for secure application such as copyright protection. Index Terms Multimedia security, Audio watermarking, DWT, SVD. I. INTRODUCTION The development of communication networks and digital supports like the compact Disc, the Digital Versatile Disc... involves a massive diffusion of documents stored using numerical formats: MPEG (Motion Picture Expert Group), MP3 (MPEG-1/2 Audio Layer 3), JPEG (Joint Photographie Experts Group)... These techniques make possible to store a great amount of information but facilitate also the illegal use of documents. Digital watermarking has been introduced as a technique to solve problems as varied as the protection of the copyright, content authentication, fingerprinting and broadcast monitoring. Audio watermarking is a process that embeds data called watermark or digital signature into a multimedia object such that the watermark can be detected or extracted later to make an assertion about the object [1]. Audio watermarking techniques should at least ensure the following requirements: (i) Imperceptibility: original and watermarked signal must be perceptually similar. (ii) Robustness: it is the ability to retrieve the inserted watermark even though the watermarked signal has been manipulated. Watermark should resist against various intentional or unintentional attacks. (iii) Security: only authorized person can extract the watermark. (iv) Capacity: it is the amount of bits that can be embedded into the audio signal per time unit and is measured in the unit of bps (bits per second). It should be more than 20 bits per second. These requirements are often contradictory. Since robustness and imperceptibility are the most important requirements for digital audio watermarking, these should be satisfied first [2]. Most of the watermarking methods proposed over the last few years focus on image and video watermarking. Indeed, digital audio watermarking is more challenging than digital video and image watermarking, because the Human Auditory System (HAS) is significantly more sensitive than the Human Visual System (HVS). In recent years, several good audio watermarking algorithms have been developed. Most of these algorithms can be divided into two groups: algorithms in time domain and algorithms in frequency domain. In time domain approach, information is embedded directly into the amplitudes of the audio signal [3] [5]. In frequency domain, host signal is transformed and then information bits are embedded into the transformed sample. This approach includes Fast Fourier Transforms (FFT) [6], [7], Discrete Cosine Transform (DCT) [8], [9] and DWT [10], [11]. However, the use of only one domain, temporal or frequency in an audio watermarking algorithm may be disadvantageous for two reasons: (1) in the spatial or frequency domain, audio quality degrades significantly when the size of the watermark increases, (2) the watermark inserted in the frequencies coefficients are slightly altered during the transformation at frequency towards time domain and vice versa. Hybrid algorithms are proposed. In this work, we continue to enhance the work of Himeur et al. [12] realised in our laboratory. We propose here a double transformation audio watermarking algorithm based on DWT and SVD. These two transforms are applied on the original signal. The insertion is done in the low frequency zone in order to ensure more robustness against common signal processing. In this case, the extraction is non-blind (the original audio signal is needed in the extraction phase) in order to improve the robustness of the watermark and the reliability of extraction process. Moreover, our method uses a double key in order to add more security. The rest of this paper is organized as follows: section 2 gives a brief representation of some related works. Section 3 describes the transformations used in this method. In section 4 the proposed watermarking method is described. Section 53 IT4OD 2014

60 5 presents experimental results and comparison. Section 6 concludes this paper. II. RELATED WORKS To ensure best performances, hybrid algorithms combining two transforms have been developed. Chen and Zhu have presented a scheme combining DWT and DCT [13]. The multiresolution characteristics of DWT, the energy compression characteristic of DCT, and the Gaussian noise suppression property of higher-order cumulant are combined to extract essential features from the host audio signal. The insertion is done by a zero-watermarking technique which hides the watermark in a secret key, not in the signal itself. Wang and Zhao used also DWT-DCT transforms [14]. They propose a digital audio watermarking scheme to prevent synchronization attack. The watermark is embedded into the low frequency components by adaptive quantization according to human auditory masking. Himeur et al. [12] proposed to combine DWT with the Discrete Sine Transform (DST). Insertion of the watermark is done in both low and high frequencies. In order to increase safety and the robustness, the mark is encrypted by Arnold transformation and is coded by a BCH (Bose, Chaudhuri et Hocqenghem) error correcting code. In comparison, the authors confirmed that DWT-DST ensure more imperceptibility-robustness compromise than the DWT-DCT. Bhat et al. have proposed to use the SVD with DWT [15]. The method is based on Quantization Index Modulation (QIM) as an insertion technique. The main idea is to divide the input signal into 2D blocks and then change the norm of the diagonal matrix S according to the bit value of the watermark. The FFT was also combined with the SVD [16]. The authors propose to use a reduced version of the SVD (RSVD). Recently, Hu et al. have combined the Discrete Wavelet Packet Transformation (DWPT) with the SVD [17]. The authors exploited the flexibility of DWPT to approximate the critical bands and adaptively determined suitable embedding strengths for carrying out QIM. The SVD is employed to analyze the matrix formed by the DWPT coefficients and embeded watermark bits by manipulating singular values subject to perceptual criteria. The major limitation of the presented methods is in therm of security. Indeed, in the best case, the algorithm use only one key to encrypt the watermark image, ensuring thus a low level security of the algorithm. In the section 4, a high security watermarking algorithm is presented in which two keys are used for insertion and extraction process. III. THE USED TRANSFORMS A. Discrete Wavelet Transform The wavelet transform is a time-scale signal analysis technique. It was developed as an alternative to the Short Term Fourier Transform (STFT) to overcome the problems related to the properties of its time and frequency resolutions. More specifically, in contrast to the STFT that provides uniform temporal resolution for all frequencies, the DWT provides a high temporal resolution and a low frequency resolution for high frequencies and high frequency resolution and low time S ca 1 cd 1 ca 2 cd 2 ca 3 cd 3 Fig level DWT decomposition of signal S resolution for low frequencies. So the wavelet transform is a suitable tool to analyse the non-stationary signals. The main objective of the wavelet transform is to hierarchically decompose an input signal into a series of low frequency approximation subbands and their detail subbands [18]. Depending on the application and the length of the signal, the low frequencies part might be further decomposed into two parts of high and low frequencies. Fig 1 shows a 3-level DWT decomposition of input signal S. The data obtained from the above decomposition are called the DWT coefficients. Moreover, the original signal can be reconstructed from these coefficients. This reconstruction is called the inverse DWT. B. Singular Value Decomposition SVD is a useful tool of linear algebra with several applications in image compression, watermarking, and other areas of signal processing. A few years ago, SVD has been exploited for image watermarking applications [19], [20]. SVD is an optimal matrix-decomposition technique in a least-square sense. It packs maximum signal energy into as few coefficients as possible. SVD has an adapting ability to variations in local statistics of a given signal, so watermarking schemes using SVD typically have high payload. Most SVD-based watermarking schemes embed watermark bits by modifying the singular values (SVs) [21]. Let A be a matrix of size n n. This matrix can be decomposed using the SVD as follows: A = UDV T = n i=1 λ i U i V T i (1) Where U and V are orthogonal matrices of size n n. D is a n n diagonal matrix, and λ i, i = 1, 2,..., n are the diagonal elements of D. Only non-zero elements are called Singular Values (SVs) of matrix A. IV. PROPOSED WATERMARKING ALGORITHM The proposed method is based on double transformation DWT-SVD. The basic idea is that the watermark is embedded into the SVs of the blocks of low frequency coefficients of the audio signal. The main reasons are following: 1) The audio signal in its original format is a mixture of high and low frequency that form semi-sinusoidal waves with uneven amplitudes and wavelengths. The energy of the 54 IT4OD 2014

61 signal is mostly concentrated on low frequencies and high frequencies present a perceptually insignificant regions of the signal. The watermark should not be placed in this regions since many common signal and geometric processes affect these components. For example, a watermark placed in the high frequency spectrum of a signal can be easily eliminated with little degradation to the signal by any process that directly or indirectly performs low pass filtering. 2) Changing SVs slightly does not affect the signal quality and SVs do not change much after various types of common signal processing operations. A. Watermark Insertion The diagram of watermark insertion is shown in figure 2. The process of insertion is summarized by the following steps: 1) The watermark, which is a binary image, is decomposed into blocks W j, j = 1, 2,..., M, each of size r r. Then the SVD is applied for each block: W j = U w j D w j V wt j (2) The matrices Uj w and Vj w are saved as a key K 1. 2) The audio signal is first decomposed up to third level using DWT. The N prominent peaks of approximation coefficients ca 3 are segmented into 2D matrix blocks B j, j = 1, 2,...M, each of size r r, where M is the blocks number of watermark image. N must be equal to M r r. The indices of N prominent peaks are saved as a key K 2. 3) SVD is applied for each block B j : (Peaks indices) Key K 2 Pre-embedding block Original signal DWT Approximation coefficients N prominent peaks Segmentation into M matrices blocks SVD Insertion Inverse SVD Concatenation of the M modified blocks Insert back the N modified peaks IDWT Watermark image Decomposition into M blocks SVD (U w j, V w j ) Key K 1 B j = U B j D B j V BT j (3) 4) Each diagonal matrix Dj w is multiplied by a scalar factor α and added to corresponding matrix Dj B in order to obtain modified diagonal matrix D j : Fig. 2. Watermarked signal Watermark Insertion Process D j = D B j + α.d w j (4) 5) The modified blocks B j are obtained using the diagonal matrix D j and the original matrices Uj B, V j B as follows: B j = U B j D j V BT j (5) 6) The modified blocks are concatenated in order to obtain the N modified peaks. 7) The N modified peaks are inserted back into the approximation coefficients. 8) The inverse DWT is calculated to reconstruct the watermarked signal. B. Watermark Extraction In case of copyright protection, the detection is usually non blind as the needed original signal is present in the process of extraction. In fact, for a copyright protection application, the presence of the original signal in extraction is not a disability for the method because for such application the owner has both original and watermarked signal. Figure 3 shows the process of extraction. It is implemented using the following steps: 1) The watermarked audio signal is first decomposed up to third level using DWT. The N peaks of approximation coefficients are extracted using K 2 key and then segmented into 2D matrix blocks ˆB j. 2) The SVD is then applied for each block ˆB j : ˆB j = Û B j ˆD B j BT ˆV j (6) 3) The step 1 and 2 are repeated for the original signal in order to obtain: B j = U B j D B j V BT j (7) 4) The diagonal matrices ˆD w j of the watermark blocks are obtained as follow: ˆD w j = ˆD B j DB j α (8) 55 IT4OD 2014

62 Watermarked signal Original signal TABLE I AUDIO SIGNALS Pre-embedding block (Peaks indices) Key K 2 Extraction of diagonal matrices ˆD w j Pre-embedding block Audio signals Male voice Female voice Jazz Pop Semph Guitar Description A male spoken voice A female spoken voice Jazz music Pop music Symphony orchestra of STOLTZMAN Guitar music Inverse SVD (U w j, V w j ) Key K Our method DWT-DCT DWT-DST Blocks concatenation Extracted watermark SNR (db) Fig. 3. Watermark Extraction Process Male voice Female voice Jazz Pop Semph Guitar Fig. 5. SNR Results for the three methods Fig. 4. Watermark Image 5) The watermark blocks W j are reconstructed using the extracted matrices ˆDw j and the matrices U j and V j obtained by the K 1 key : Ŵ j = U j ˆDw j V T j (9) 6) The extracted watermark image is obtained by concatenation of Ŵj blocks. V. EXPERIMENTAL RESULTS The performance of this algorithm is evaluated in terms of imperceptibility and robustness. Moreover, our system is compared with the system of Himeur et al. Various signals.wav of different styles (classical music, speech, musical instruments...) are used in experiments (Table I). Each signal is a 16 bits signed mono signal sampled at 44.1 KHz with duration of 40 seconds. The watermark is a binary image with size (figure 4). The parameters used in these algorithms are set as follows: Block size r r is 4 4, a strength embedding coefficient α = 0, 09. These parameters have been experimentally selected in order to achieve good imperceptibility-robustness tradeoff and have been tuned on the validation data set. A. Imperceptibility Results An inaudible watermarking is undoubtedly the first constraint of a watermarking diagram. Perceptually, the watermarked audio signal must be identical to the original. In term of imperceptibility, there are several evaluation criteria of audio watermarking algorithms. In this work, the Signal-to- Noise Ratio (SNR) is used to measure the similarity between the original audio and the watermarked audio (equation 10). SNR = 10 log( L 1 i=0 x(i)2 L 1 i=0 [x(i) x w(i)] ) (10) 2 where x(i) and x w (i) denote the host and the watermarked audio respectively. The imperceptibility gets better with the increase of SNR. The results of SNR for the 6 signals are shown in figure 5. It appears that the proposed method provides an SNR above 24 db (SNR mean 27 db), largely superior to the minimum value imposed by IFPI (International Federation of Phonographic Industry) [22] which recommends that the audio watermarking algorithm must offer a SNR higher than 20 db. It can also be noted that the DWT-DST ensures more inaudibility than the other methods, with a mean SNR of 30 db. This is due to the fact that the DWT-DST uses a double frequency transformation and DST compacts the energy input using more AC coefficients than the DCT. 56 IT4OD 2014

63 TABLE II NORMALIZED CORRELATION RESULTS Attacks Our method DWT-DCT DWT-DST Noise addition Cropping Re-sampling Noise reduction Re-quantization MP3 compression (128kbps) Low-pass filtering Echo addition B. Robustness Results In order to evaluate the robustness performance of this method, the Normalized Correlation (NC) is used to measure the similarly between the extracted and the original watermark (equation 11). We chose classical music (Semph) to perform these evaluation, because this kind of music has a larger frequency band, and it s more adequate for robustness evaluation. Various signal processing attacks are tested to assess the robustness of the methods: Addition of noise, Noise reduction, Echo addition, Re-sampling, Re-quantization, Lowpass filtering, Cropping and MP3 compression. NC = M1 i=1 M1 i=1 M2 i=1 w(i, j)ŵ(i, j) M2 i=1 w(i, j)2 M1 i=1 M2 i=1 ŵ(i, j)2 (11) Where w and ŵ denote the original and extracted watermarks respectively, i and j are indexes of the binary watermark image. Table II shows the results of robustness in terms of Normalized Correlation obtained for the three methods. Extracted watermark image after the different attacks for the three methods are shown in figure 6. From the above results, we can see that our method (DWT- SVD) provides high robustness against the most attacks with a certain fragility against the noise reduction attack. The method is more robust than the two other techniques against the following attacks : Cropping, Echo addition, MP3 compression. For the two methods (DWT-DCT and DWT-DST), it appears that Echo addition affects seriously the extraction of the watermark, indeed the extracted watermark is completely scrambled. We can also note that the DWT-DCT and DWT-DST methods present a certain fragility against MP3 compression and cropping attacks. VI. CONCLUSION AND FUTURE WORK A robust audio watermarking algorithm based on double transformation DWT-SVD is presented in this paper. The watermark is embedded into the SVs of the blocks of low frequency coefficients of the audio signal. Experimental results Fig. 6. Extracted watermarks for the three methods after various attacks : (a) noise addition (b) Cropping (c) Re-sampling (d) Noise reduction (e) Requantization (f) MP3 compression (g) Pass-low filtering (h) Echo addition. show that this method is robust against common signal processing. Compared to the method of Himeur based on DWT- DCT and DWT-DST, our method ensures more robustness with an acceptable imperceptibility (SNR around 27 db), making it suitable for copyright protection application. Moreover, a certain level of security is guaranteed by using two secret keys in the insertion and extraction process. As future work, we will try to find the best watermark insertion positions, which ensure a good robustnessimperceptibility tradeoff, using an optimization algorithm such as genetic algorithm, Ant Colony Optimization or particle swarm optimisation. REFERENCES [1] F.A.P. Petitcolas, R.J. Anderson and M.G. Kuhn, Information hiding- a survey, Proceeedings of the IEEE, vol. 87, no. 7, pp , [2] S. Katzenbeisser, F.A.P. Petitcolas, Information Hiding Techniques for Steganography and Digital Watermarking, Artech, Boston, IT4OD 2014

64 [3] P. Basia, I. Pitas and N. Nicholaidis, Robust audio watermarking based on the time domain, IEEE Trans. Multimed, vol 3, no 2, , [4] W-N. Lie, L-C. Chang, Robust and high-quality time-domain audio watermarking based on low-frequency amplitude modification, IEEE Trans. Multimed, vol 8, no 1, 46-59, [5] F. Abd, El-Samie, An efficient singular value decomposition algorithm for digital audio watermarking, Int. J. Speech. Technol., vol 12, no 1, 27-45, [6] P.K. Dhar, M.I. Khan, C.H. Kim and J.M. Kim, An Efficient Audio Watermarking Algorithm in Frequency Domain for Copyright Protection, Communications in Computer and Information Science, vol. 122, pp , [7] D. Megias, J. Serra-Ruiz, M. Fallhpour, Efficient self-synchronised blind audio watermarking system based on time domain and FFT amplitude modification, Signal Process. vol 90, no 12, , [8] X. Wang, W. Qi, P. Niu, A new adaptive digital audio watermarking based on support vector regression, IEEE T. Audio Speech, vol 15, no 8, , [9] L-K. Yeo, H-J. Kim, Modified patchwork algorithm: a novel audio watermarking scheme, IEEE Trans. Speech and Audio Processing, vol 11, no 4, , [10] R. Wang, D. Xu, J. Chen and C. Du Digital audio watermarking algorithm based on linear predictive coding in wavelet domain, In 7th IEEE International conference on signal processing, vol 1, pp , [11] S-T. Chen, G-D. Wu, H-N. Huang, Wavelet-domain audio watermarking scheme using optimisation-based quantization, IET Signal Process. vol 4, no 6, , [12] Y. Himeur, B. Boudraa and A. Khelalef, A Secure and High Robust Audio Watermarking System for Copyright Protection, International Journal of Computer Applications, vol. 53, no. 17, pp , [13] N. Chen and J. Zhu, A Robust Zero-Watermarking Algorithm for Audio, EURASIP Journal on Advances in Signal Processing, pp. 1-7, [14] X. Wang and H. Zhao, A Novel Synchronization Invariant Audio Watermarking Scheme Based on DWT and DCT, IEEE Trans. Signal Processing, vol. 54, no. 12, pp , [15] V. Bhat K, I. Sengupta and A. Das, Audio Watermarking Based on Quantization in Wavelet Domain, ICISS, vol. 5352/2008, pp , Springer [16] J. Wang, R. Healy and J. Timoney, A Novel Audio Watermarking Algorithm Based On Reduced Singular Value Decomposition, In Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP) Sixth International Conference, pp ,2010. [17] H-T. Hu, H-H. Chou, C. Yu and L-Y. Hsu, Incorporation of perceptually adaptive QIM with singular value decomposition for blind audio watermarking, EURASIP Journal on Advances in Signal Processing, 2014:12, [18] N. Sriyingyong and K. Attakitmongco, Wavelet based audio watermarking using adptive tabu search, In 1st international symposium on wireless pervasive computing, pp. 1-5, [19] P. Bao and X. Ma, Image adaptive watermarking using wavelet domain singular value decomposition, IEEE Trans Circuits Syst Video Technol, vol.15, no. 1, pp , [20] E. Yavuz and Z. Telatar, Improved SVD- DWT based digital image watermarking against watermark ambiguity, In Proceedings of the ACM symposium on applied computing, pp , [21] V. Bhat K, I. Sengupta and A. Das, An audio watermarking scheme using sigular value decompostion an dither modulation quantization, Multimedia Tools Appl. doi: /s , [22] IFPI (International Federation of the Phonographic Industry) : 58 IT4OD 2014

65 La tolérance aux fautes à base d agents pour les systèmes domotiques Zaiter Meriem 1,2 1 : Université L arbi Ben M hidi d Oum El Bouaghi 2 : Laboratoire Lire, Université de Constantine 2 Algérie Hacini Salima et Boufaida Zizette Laboratoire Lire, Université de Constantine 2 Algérie {salimahacini, Résumé Les systèmes distribués sont des systèmes constitués de plusieurs composants, chacun accomplissant une tâche afin de satisfaire un objectif individuel ou collectif. Afin d offrir un confort et une disponibilité des services efficients pour des utilisateurs de plus en plus exigeants, les systèmes distribués sont devenus sophistiqués. Les systèmes domotiques qui visent essentiellement l automatisation des tâches liées aux équipements d une maison, représentent un exemple de cette catégorie de systèmes. Les composants constituant ces systèmes varient d un contexte d utilisation à un autre selon l objectif des systèmes euxmêmes. Par ailleurs, l occurrence d une faute dans un des constituants ou même dans l environnement influe, à court ou à long terme, sur l état de fonctionnement global du système. Ce problème est d autant plus grave s il s agit d un système qui, à titre d exemple, permet de localiser une personne âgée et/ou dépendante dans son lieu de vie et de détecter tout signe de danger (situation anormale). Cet article s attelle à répondre à la question : comment continuer de satisfaire les utilisateurs en dépit de la présence de faute? Autrement dit, comment assurer la fiabilité de fonctionnement de tels systèmes? Mots-clés détection de faute, faute, agent, tolérance aux fautes, système distribué, système domotique. I. INTRODUCTION Les systèmes domotiques sont des systèmes à base de composants dont le rôle est le contrôle de fonctionnement des équipements disponibles dans une habitation pour faciliter les tâches du quotidien tels que le système SLCA[1], et MASML [2]. Le fonctionnement de ces systèmes se base généralement sur la réalisation des scénarios. Chaque équipement fournit des services décrits par un ensemble d opérations et émet des événements qui reflètent ses changements d états. La domotique offre plusieurs avantages : (1) le contrôle de toute l habitation par même un simple geste, (2) un bon confort d utilisation (automatique ou semi-automatique), (3) une possibilité d économie d énergie par le contrôle de consommation d éclairage ou de température ainsi que la possibilité de la télésurveillance de la maison et des personnes à domicile (tel que la télémédecine), etc. Malgré ces avantages, des problèmes peuvent contrer le bon fonctionnement de ces systèmes [3] tels que : (1) la non flexibilité lors de l intégration des équipements de sous systèmes (par exemple le système d éclairage avec le système de gestion d énergie) (2) gestion précaire lors de l automatisation des préférences des utilisateurs qui changent fréquemment... (3) le problème de sécurité engendré par une mauvaise manipulation du système (intentionnelle ou non) ce qui engendre un système global de fonctionnement non sûr. Dans la littérature, la sûreté de fonctionnement est la propriété qui permet à ses utilisateurs de placer une confiance justifiée dans le service qui leur est délivré. Le service délivré par un système se substitue à son comportement tel que perçu par ses utilisateurs; un utilisateur est un autre système (humain ou physique) qui interagit avec le système considéré [4]. La sûreté d un système peut être assurée [5, 6] par principalement: La prévention des fautes qui se base sur l utilisation de composants informatiques de très bonne qualité (très fiables). La tolérance aux fautes qui signifie que le système continue de fonctionner même en présence d une panne. La tolérance aux fautes représente la capacité d un système à remplir sa fonction en dépit des fautes [7]. Cette tolérance peut être garantie par la détection et le recouvrement des fautes. La détection peut être effectuée notamment par le contrôle de vraisemblance [5, 6], basé sur le matériel, le logiciel ou sur un code spécial de test périodique. Tandis que les mécanismes de recouvrement [8] peuvent se résumer par l élimination des erreurs (via la reprise, la poursuite ou la compensation) ou bien l empêchement de la faute (via le diagnostic, l isolation, la reconfiguration ou la réinitialisation). Actuellement, les systèmes domotiques n offrent aucune possibilité de tolérances aux fautes spécifique au besoin réel de leurs utilisateurs particulièrement en cas de passage à l échelle [9]. D autre part, beaucoup de systèmes domotiques utilisent le concept d agent pour diverses raisons [10] tels que la mobilité, l adaptabilité, etc. La technologie d agent y est effectivement employée car un agent peut [11] : Offrir une bonne solution au passage à l échelle, si on incorpore un système domotique dans un système plus large. Etre flexible, réactif Migrer d un composant à un autre Garder la trace du flux d exécution 59 IT4OD 2014

66 Communiquer avec un utilisateur, un composant ou d autres agents (ce qui constitue un bon critère pour le passage à l échelle). Etre, selon les besoins, cloné ou tué, etc. Nous utiliserons nous aussi les agents en exploitant : (1) quelques uns de ses avantages tels que le clonage ou la mobilité (2) la notion du contexte caractérisant un système domotique pour rassembler d éventuelles fautes afin de présenter un système tolérant aux fautes basé sur les agents. La suite de cet article est organisée en quatre sections. La section 2 est consacrée à la définition des fautes pouvant nuire au bon fonctionnement du système en exploitant les caractéristiques des systèmes domotiques et la notion de contexte. La section 3 présente les hypothèses et les algorithmes relatifs à notre mécanisme de tolérance aux fautes. La section 4 décrit l étude de cas et les résultats préliminaires de notre expérimentation. Enfin, une conclusion et des perspectives achèvent cet article. II. LES FAUTES DANS LES SYSTEMES DOMOTIQUES Les environnements domotiques peuvent être liés à des contextes de natures différentes. Chaque contexte représente l état d un composant dans l environnement. Ainsi, le système domotique peut être classé dans la catégorie des systèmes sensibles au contexte. Il existe une pléthore de définitions du contexte [12]. Table 1. Les différentes catégories de faute événements qui influent sur l état global du système. A cet effet, l occurrence d une faute engendre certainement un disfonctionnement immédiat du système global» ; Dans les systèmes domotiques, les fautes peuvent être malencontreusement introduites au cours de la conception et/ou de l implémentation des composants du système global, et peuvent l être aussi au cours de leur utilisation spécialement si les utilisateurs sont néophytes. Par conséquent, les fautes peuvent être dues à des fautes du logiciel (une déviation arbitraire par rapport au code,...), du matériel (arrêt de fonctionnement d un composant ou ses constituants internes), à cause d erreurs de transmission telles que l omission d envoi ou de réception de messages ou encore d attaques malveillantes (à titre d exemple, l injection d un code dans le système par un utilisateur malveillant peut causer une déviation du flux d exécution normal). La Table1 présente une catégorisation de ces fautes.. III. LE DETAIL DE NOTRE PROPOSITION La sûreté de fonctionnement de notre système est liée à l assurance de la continuité de l échange instauré entre les différentes entités du système. Cette continuité, notée sf(ei), est exprimée, dans cet article, en terme de sûreté de fonctionnement des composants (entités) constituant le contexte du système global; plus formellement, la sûreté (R) d un système S est vue comme : R (S)= {sf (e1), sf (e2), sf (e3),.. sf(en)} (1) Afin d assurer une bonne sûreté de fonctionnement, le système doit mettre en œuvre un mécanisme de contrôle faisant intervenir toutes les entités. Ainsi, ce système de contrôle est supporté par un ensemble d Acls (Agents contrôleurs locaux) ainsi qu un CG (Contrôleur Global) dont les rôles seront détaillés au niveau de cette section. Les différents contrôleurs sont représentés par des agents. L architecture du système de contrôle qui s occupe essentiellement de la détection et du recouvrement des fautes qui surviennent est présentée par la Figure 1. Chaque entité ei offre un ensemble de services SO(ei) sous forme d un quadruplet. Cet ensemble comporte la liste des services pouvant être fournis par l entité (ei) : SO(e i )={(s 0,f(s 0 ),c 0,d 0 ),(s 1,f(s 1 ),c 1,d 1 ),(s 2,f(s 2 ),c 2,d 2 ) } (2) De notre côté, nous avons proposé une définition en rapport avec la détection de fautes : «Le contexte est toute information interne ou externe, relative à une entité, exploitable et pouvant avoir un impact sur l état futur de l application. Ces informations peuvent dépendre d une ou plusieurs entités. Ces dernières, peut importe leur nature (matérielle/logicielle/humaine), peuvent déclencher des où f(s i ) représente l ensemble des services gérés par le service «s i», c i et d i, dénotent respectivement le coût et la disponibilité du service i («disponible» ou «occupé»). Ces deux paramètres sont utilisés comme un critère de choix du service miroir (Cf. Figure 5). Cet ensemble est envoyé au contrôleur global. De plus à chaque connexion d une entité (ei) au système, deux tâches seront exécutées: ei lance sa requête en demandant un service. Dès qu elle reçoit l ensemble des réponses, e i définit l ensemble des dépendances fonctionnelles D (e i ) et les envoie au contrôleur global. La dépendance fonctionnelle D(e i ) de l entité ei est définie par le couple D(ei)={(ej,s k ) } où s k représente le service offert par e j à e i. 60 IT4OD 2014

67 Agent Contrôleur i Entité (e i ) Agent contrôleur global Agent Contrôleur k Entité (e k ) entités du système telles que l identité de l entité notée ei, son état sous forme d un ensemble de couples (ss, état) où chaque couple représente un service offert par l entité et son état, une entrée à la table contenant aussi la description de la liste de services offerts par l entité et l ensemble de ses dépendances fonctionnelles ainsi que l état d exécution émis périodiquement par l agent local de l entité (Cf. Table2). Entité porteuse Table 2. Les informations caractérisant une entité au niveau du CG Agent Contrôleur j Entité (e j ) Agent Contrôleur m Entité (e m ) : Événement interne : Evénement externe Fig. 1. L architecture du système de contrôle Pour mener à bien notre mécanisme un ensemble de contraintes sont à respecter: 1) Dans le cas où l entité n est plus active ou qu elle ait atteint son but, les ressources qui lui ont été affectées seront libérées. 2) Les messages émis (respectivement reçus) doivent être numérotés selon un ordre croissant de leur arrivée (émission), ceci permet d éviter le retraitement des requêtes si elles sont en cours d acheminement ou de traitement (voir les instructions de l événement 34 au niveau de la Figure 6) 3) Les messages émis doivent être doublement signés par l entité et par son agent contrôleur local et ceci afin de contrôler le fonctionnement de l agent, contrôleur local, luimême qui sera tué et remplacé par un autre avec la valeur la plus récente de l état d exécution de l entité (Cf. Table2). 4) En cas de détection d un disfonctionnement de l agent contrôleur global, son duplicata est activé automatiquement. Cette détection est accomplie par la sollicitation du duplicata, par les Acls qui exécutent un algorithme de vote pour confirmer ce disfonctionnement, et donc le système s adapte automatiquement. Il est à noter que le duplicata possède la même vue, que l ACG, du contexte du système global alors il est périodiquement mis à jour. 5) Dans le cas d un disfonctionnement de l entité supportant l agent contrôleur global, ce dernier va se déplacer automatiquement vers une autre entité. A. Le contrôleur global Le contrôleur global est un agent qui assure un ensemble de fonctionnalités servant à la gestion de la détection de la faute, et à sa prise en charge. Pour ce faire, ce contrôleur doit prendre en compte un ensemble d informations disposées dans une base de connaissances. Ainsi, la base de connaissances du contrôleur global comporte des informations relatives aux Entité état Service offert Dépendance Etat d exécution e1 (s1,bon) (s2,bon) (s1, f(s1), c1, d1) (e2,s4), Ex1 a) L agent contrôleur global en tant que gestionnaire de détection de faute : Le fonctionnement de l agent contrôleur global est détaillé à travers un ensemble d algorithmes. Certains d entre eux servent à la gestion des événements circulant dans le système (Cf. Figure 2) tels que la notification de l existence d une faute au niveau de l entité ei par son agent contrôleur local (Cf. instruction 10 de la Figure 2) ou bien par un autre agent contrôleur local (autre que son propre contrôleur) (Cf. instruction 15 de la Figure 2) et d autre types d événements Terminologie nv_elt: signifie qu une nouvelle entité est connectée au système global. rechercher (E,S) permet de proposer un ensemble de services miroir (sm) à l entité affectée par la panne. charger_dc(ei): ajoute à la base de connaissances (Cf. Figure 2) une entrée contenant des informations sur l entité (ei) (état (ei), D (ei) ). Extract_état (ej,ss): cette fonction permet la récupération de l état de service ss (ej) à partir de la base de connaissances (Cf. Table 2). Rv (Cg, ej,ss): est une requête de vérification émise par l agent contrôleur global à (ej) dans le but de tester son état. A la réception de l ensemble des dépendances fonctionnelles d une entité donnée (ei) (Cf. instruction 9 de la Figure 2), l agent contrôleur global exécute la procédure de mise à jour des disponibilités des services. L ensemble P représente l ensemble des entités qui dépendent fonctionnellement du : service défectueux (ss) et ses fils (ss) (Cf. Table2). 61 IT4OD 2014

68 1 Entrée: événement 2 Début 3 Répéter 4 Selon le cas de (événement) faire : 5 nv_elt: 6 créer l agent contrôleur Acl i 7 charger_dc (ei) 8 D (ei) : 9 maj disponibilité (D (ei)) 10 alerte (Acl i, ei, ss) : 11 P prise en charge1(ei,ss) 12 envoyer faute (ei, Cg, ss) à chaque entité de l ensemble P 13 rechercher (E,S) pour chaque entité de l ensemble P 14 envoyer Miroir (Cg, em, sm) à chaque entité de l ensemble P 15 alerte (Acl i,ej,ss) OU bon_état(acl j, ej,ss) : 16 verif_état (ej,ss) 17 alerte (Alci) : 18 Si vérifier_sig (Acli) alors 19 créer un nouveau agent Acl i avec l état EXi et tuer l ancien 20 Fin si 21 jusqu a (faux) 22Fin Fig. 2. Tâche du contrôleur global 1 Entrée : entité : ej, service ss 2 Sortie: état de ej 3 Début 4 état (ej,ss) Extract_état (ej,ss); 5 si (état (ej,ss) =panne) alors 6 envoyer faute (ej,cg,ss) à Acl i 7 sinon 8 envoyer Rv (Cg,ej,ss) à Acl j 9 si (Rep (ej,ss)) alors 10 envoyer bon_état (Cg, ej,ss) à Acl i 11 sinon 12 P prise en charge1((ej,ss)) 13 envoyer faute (ej, Cg,ss) à chaque entité de l ensemble P 14 rechercher (E,S) pour chaque entité de l ensemble P 15 envoyer Miroir (Cg, em, sm) à chaque entité de P 16 Fin si 17 Fin si 18 Fin Fig. 3. La fonction verif_état L ensemble P représente l ensemble des entités qui dépend fonctionnellement du : service défectueux (ss) et ses fils (ss) (Cf. Table2). b) L Agent contrôleur global tolérant aux fautes Dès la confirmation de la détection d une faute, l agent contrôleur global exécute les deux sous-tâches suivantes pour la prendre en charge: Déclare l entité (ei) comme partiellement défectueuse au service s, en exécutant l algorithme de la Figure 4 qui signale la panne du service assuré par une entité aux entités qui lui sont dépendantes (Cf. instruction 11 de la Figure 2, instructions 12 et 13 de la Figure 3). L algorithme de la prise en charge de la faute repose sur l exploration de D(ei) et la description de la liste de services offerts ) (Cf. Table2) pour signaler la faute. L algorithme de prise en charge de la faute fait appel à la procédure de mise à jour de l état du service en panne et de tous ses fils. Fonction : prise en charge1 1 Entrée : (ei,ss) 2 Sortie : ensemble de paires P (entité e, service s) 3 Début 4 Pour j=1 jusqu a NE faire \\ NE : est le nombre d entités dans le système 5 Répéter 6 Dt(ej) D(ej)//est une variable temporaire pour le traitement 7 (en, sn) Extraire un élément de Dt(ej) 8 Si en=ei et sn {ss}u {fils (ss)} alors 9 Insérer P (ej, sn), 10 Mise à jour état (ei, ss) 11 Fin si 12 Jusqu'à Dt(ej)=Φ 13 Fin pour 14 Retourner (P) 15 Fin Fig. 4. Prise en charge de la faute La seconde sous-tâche de l agent contrôleur global (Cf. instruction 13 de la Figure 2; instruction 14 de la Figure 3) consiste à assurer la continuité du fonctionnement du système en exécutant l algorithme présenté par la Figure 5 et qui permet de rechercher les services similaires. (se, ce, te) : correspond au service élu. Fonction : rechercher (E,S) 1. Entrée : paire ( ei, s) 2. Sortie : paire (entité, service) : 3. Début 4. ts occupé// disponibilité du service 5. état e panne// état du service de la table 2 6. se s 7. ss se 8. ce max (c) 9. élu i 10. Pour j=1 jusqu à NE faire 11. Si (ss=se)et (ts= "disponible ") et (cs<=ce) et état e= " bon" alors 12. (se, ce, te) (ss, cs, ts) 13. élu j 14. Fin si 15. (ss, cs, ts) Extraire service offert 16. état e Etat (ss) 17. Fin pour 18. Si état e= «panne»alors 19. Retourner (Φ, Φ ) 20. Sinon 21. Retourner (e élu, se) 22. Fin si 23. Fin Fig. 5. La recherche des services similaires B. Le contrôleur local L idée de mettre en œuvre un contrôleur local repose sur celle de l autotest qui permet un contrôle individuel de chaque entité. Ainsi chaque contrôleur local exécute un ensemble de tâches qui lui permet de vérifier le fonctionnement de l entité qui lui est associée. Un contrôleur local (Acl i ) est donc un agent assurant les fonctionnalités suivantes : récupérer les événements externes (les sollicitations) arrivant à l entité (ei). Dans ce cas un événement externe peut être : (1) une simple requête d une entité (ej) (Cf. instruction 5 de la Figure 6); (2) un feed-back négatif provenant d une entité (ej) et résultant d une éventuelle interaction passée avec 62 IT4OD 2014

69 l entité (ei); (3) une interrogation sur l état de service (ss) délivrée par l entité (ei), ou bien une information de panne de (ss) délivrée par l entité (ej) si (ei) dépend fonctionnellement de l entité (ej) (cet événement est déclenché par l agent contrôleur global) (Cf. instruction 10 de la Figure 6); signaler le disfonctionnement de l entité (ei) dû à l une des deux éventualités suivantes: (1) un feed-back négatif dans le cas d un résultat erroné (par exemple, une valeur retournée hors domaine de spécification) ou bien un signal de type «non-réponse» de (ei) (ce signal est déclenché par le Acl j de l entité (ej)). (2) une non-réponse au test de contrôle périodique exécuté par l agent contrôleur (Acl i ) lui-même, et donc le contrôleur peut signaler un disfonctionnement de (ei) (Cf. instruction 18, 20, de la Figure 6); Générer un signal de type «non-réponse» de l entité (ej) si elle ne respecte pas sa promesse de réalisation d un service. Garder la trace d exécution afin de capturer l état d exécution qui sera utilisé pour la reprise en cas de faute dans un service délivré (Cf. instruction 42 de la Figure 6). Avant d établir les algorithmes de fonctionnement de l agent contrôleur, nous notons qu un fonctionnement normal d une entité consiste en un envoi de requête (RS(ei,ej,ss)), une attente des réponses et une définition éventuelle des dépendances fonctionnelles (sauvegarder (RS(ei,ej,ss)) ; attente (RP(ej,ss))) (Cf. instructions 7..9 de la Figure 6); mais cet agent (Acl i ) envoie une alerte (alerte (Acl i, ej,ss)) si une entité ej a promis à ei de réaliser le service ss et elle n a pas répondu, ou bien si cette dernière a renvoyé un résultat erroné. Chaque traitement d événement est précédé par une vérification de signature et envoi d une alerte (Alci) en cas de détection d erreur. 1. Entrée: événement; 2. Début 3. Répéter 4. Selon le cas de (événement) faire: 5. RS(ej,ei,ss) et non (Verify_Rq (ej, ei,ss) 6. sauvegarder (RS(ei,ej,ss)) ; 7. envoyer df(ei,ss) à Alcj 8. Traiter (RS(ei,ej,ss)) ; 9. envoyer (RP(ei,ss)) ; 10. faute (ej,cg,ss) ou alerte (Alcj, ej,ss) : 11. si (Verify_Rq (ei, ej,ss)) alors 12. annuler (RS(ei,ej,ss)) 13. sinon 14. état (ej) panne 15. fin si 16. alerte (Alcj, ei,ss): 17. Si (état (ei,ss)= panne) Alors 18. envoyer alerte (Alci,ei,ss) à Cg 19. envoyer alerte(alci,ei,ss) à Alcj 20. Sinon 21. Si ((verif_ état _ local (ei,,ss, 0))=0) Alors 22. envoyer alerte(alci,ei,ss) à Cg 23. envoyer alerte(alci,ei,ss) à Alcj 24. Sinon 25. envoyer bon_état(alci,ei,ss) à Alcj 26. envoyer bon_état(alci,ei,ss) à Cg 27. Fin Si 28. Fin si 29. df(ej,ss): 30. mise à jour D(ei); 31. envoyer D(ei) à Cg 32. bon_état(alcj, ej,ss) ET non RP(ej,ss): 33. envoyer RS2 (ei,ej,ss) à Alcj// le 2 ièm envoi de la requête numéro par le même numéro de la première 34. bon_état(cg, ej,ss) ET non RP(ej,ss) et vérifier(rs2 (ei,ej,ss)): 35. Envoi RS(ei,ej,ss) à Cg // utiliser le Cg comme une passerelle pour la récupération du résultat 36. sauvegarder (RS(ei,Cg,ss)) ; 37. attente (RP(Cg,ss)); 38. Miroir (Cg, em, sm) : 39. Envoi RS(ei,em,sm) à Alcm 40. sauvegarder (RS(ei,em,sm)) ; 41. attente (RP(em,sm) 42. Demander l état de l exécution courante à l entité 43. bon_état(alcj, ej,ss) : 44. attente RP(ej,ss) 45. non RP(Cg): 46. lancer le traitement de vérification du CG par le duplicata pour prendre la décision 47. déclarer mode dégradé dans le système 48. RP(ej,ss) ou RP(Cg,ss): 49. Valider RP(ej,ss) 50. non RP(ej,ss) : 51. Envoyer alerte (Alci, ej,ss) à Cg 52. Envoyer alerte (Alci, ej,ss) à Aclj 53. vérifier_sig(cg) : 54. envoyer message «je suis vivant» à Cg 55. jusqu a (faux) 56. Fin; Fig. 6. Algorithme de fonctionnement de l agent contrôleur local Terminologie RS (ei, ej,ss): est une requête simple envoyée par ei vers ej en demandant le service ss. Elle désigne un cas de fonctionnement normal. RP((ei,ss) /Cg): indique que ei (ou Cg) a répondu à une requête. Faute (ej, Cg,ss): indique une défaillance d une entité ej (au niveau du service ss) signalée par l agent contrôleur global. Rep (ei,ss): est une fonction booléenne. Elle représente la réponse ou non de ei au test local lancé par l agent contrôleur local Acl i pour le service ss. verif_ état _ locale (ei,ss, t): cette fonction représente le test local, a propos du service ss, déclenché après une durée t. Elle retourne la valeur 0 si l entité ei n a pas répondu, et 1 sinon. Alerte (Acli, ei, ss): elle dénote la défaillance d un service ss assuré par une entité signalée par son contrôleur local Acl i. bon_état (Acli, ei,ss): c est un événement envoyé indiquant que le service ss assuré par ei est en bon état. df(ej,ss): dénote une promesse de l entité ej pour réaliser le service ss (dépendance fonctionnelle en service ss). Verify_Rq (ei, ej,ss): est une fonction qui vérifie si une requête envoyée par ei est en cours d accomplissement par ej ou non. Sauvegarder (RS(ei,ej,ss)) : permet d enregistrer la requête (envoyée par(ei) et à accomplir par (ej) attente (RP(ej,ss)): c est une instruction indiquant l attente de réponse d une entité ej à la requête de (ei). Miroir (Cg, em, sm) : c est un événement émis par l agent contrôleur global indiquant le service miroir sm élu et l identité de l entité qui l assure dans le but de garantir la continuité du fonctionnement du système global. vérifier(rs2(ei,ej,ss)) est une fonction qui vérifie si une 2 ème requête est envoyée par ei à ej ou non 63 IT4OD 2014

70 IV. EXPERIMENTATION Le système domotique choisi pour l expérimentation concerne la surveillance d une personne malade, âgée et/ou dépendante dans son lieu de vie et permet de détecter tout signe de danger (situation anormale). La domotique vise en effet à apporter des solutions techniques pour répondre aux besoins de confort, de sûreté, de sécurité (alarme) et de communication (commandes à distance, signaux visuels ou sonores, etc.) que l'on peut retrouver dans les maisons, les hôtels ou les lieux publics, etc. Les composants considérés par notre système sont un glycomètre, un défibrillateur automatisé externe pour l analyse automatique du rythme cardiaque et la délivrance d un choc électrique de défibrillation à la victime d'un arrêt cardiaque soudain, un tensiomètre électronique automatique de poignet, un climatiseur, un téléviseur, un four, un chauffage, un détecteur de feu, un téléphone portable, etc. L expérimentation se base sur des scénarios de problèmes qui seront simulés par l injection de fautes et la vérification de leur détection. Dans les résultats obtenus et présentés par la figure 7, nous remarquons que les fautes ont un bon niveau de détection notamment celles du contrôleur local ou global ainsi que celles liées à l utilisateur (matériel ou logiciel). Cependant, leur prise en charge est moins bonne car cette dernière dépend, d une part, de la persistance de la faute ; spécifiquement les fautes permanentes (voir tableau 1) nécessitent une intervention humaine et d autre part, du nombre des services miroirs existant dans le système. Fig. 7. Une représentation graphique des résultats de détection et de prise en charge V. Conclusion Notre réflexion se base sur l utilisation des agents afin de contrôler le bon fonctionnement du système et d offrir une bonne disponibilité des services cruciaux dans un système domotique de santé. Le comportement de ce système sensible au contexte dépend de la situation dans laquelle se trouvent ses composants, et un disfonctionnement de l un des composants va certainement affecter le fonctionnement du système. La tolérance aux fautes permet de résoudre ce problème. Le mécanisme de tolérance aux fautes présenté dans cet article exploite, entre autres, la réplication, la signature numérique ainsi que l emploi des agents dont les avantages tels que le clonage ou la migration ne peuvent que le renforcer. Ainsi, des agents contrôleurs ont contribué à la détection et au recouvrement d éventuelles fautes. Nous envisageons de tester le mécanisme proposé sur d autres types de fautes notamment celles causées par l environnement. Nous œuvrons aussi à la distribution de contrôle entre les agents locaux eux-mêmes. Ainsi chaque agent local aura la possibilité de mettre à jour l état du service offert par l entité défectueuse en cas de réception d une information de panne. Par conséquent, après une certaine durée de temps chaque agent possèdera une vue partielle sur l état du système et une vue globale par rapport au besoin de son entité, ce qui lui permet de construire dynamiquement sa propre communauté de communication. REFERENCES [1]. J. Y. Tigli, S. Lavirotte, G. Rey, V. Hourdin, D. Cheung-Foo-Wo, E. Callegari, and M. Riveill.WComp Middleware for Ubiquitous Computing : Aspects and Composite Event-based Web Services. Annals of Telecommunications, 64(3-4) :197_214, April [2]. C. L. Wu, C. F. Liao, and L. C. Fu. Service-oriented smart-home architecture based on OSGi and mobile-agent technology. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 37(2) :193_205, [3]. Ratu l Mahajan-Sharad Agarwal Stefan Saroiu Colin Dixon A.J. Bernheim Brush, Bongshin Lee. Home Automation in the Wild: Challenges and Opportunities. In ACM Conference on Computer-Human Interaction, May [4]. J.-C. Laprie, J. Arlat, J.-P. Blanquart, A. Costes, Y. Crouzet, Y. Deswarte, J.-C. Fabre, H. Guillermain, M. Kaâniche, K. Kanoun, C. Mazet, D. Powell, C. Rabéjac et P. Thévenod, Guide de la sûreté de fonctionnement, 369p., [5]. Lamport, Leslie; Robert Shostak, Marshall Pease (July 1982). "The Byzantine Generals Problem". ACM Transactions on Programming Languages and Systems 4 (3): Retrieved [6]. J.J. Horning, H.C. Lauer, P.M. Melliar-Smith, B. Randell, A program structure for error detection and recovery, Proceedings of the Conference on Operating Systems : Theoretical and Practical Aspects, IRIA, France, Springer-Verlag, Lecture Notes in Computer Science Vol.16, pp , April [7]. Algidras Avizienis, Jean-Claude Laprie, Brian Randell Et Carl Landwehr, Basic Concepts And Taxonomy Of Dependable And Scure Computing, Ieee Transactions On Dependable And Secure Computing, Pp.1133, Jan-Mar [8]. Thomas Pareaud adaptation en ligne de mécanismes de tolérance au fautes par une approche à composant ouverts. Université de Toulouse. Thèse de doctorat soutenue le 27 janvier [9]. Sajid Sohail, Tariq Javid Robust Home Care Access Network Master s Thesis in Computer Systems Engineering, School of Information Science, Computer and Electrical Engineering Halmstad University, May [10]. F. Hamoui, C. Urtado, S. Vauttier, and M. Huchard. SAASHA : A selfadaptable agent system for home automation. In Proc of the 36th EUROMICRO Conference on Software Engineering and Advanced Applications, pages 227_230. IEEE Computer Society, September [11]. D. B. Lange and M. Oshima, Seven Good Reasons for Mobile Agents. Communications of the ACM, Vol. 42, No.3, March [12]. Dey, A.k., Abowd G. D.: Towards a Better understanding of context and context- Awarness, At the CHI 2000 Workshop (2000) 64 IT4OD 2014

71 1 Robust Multimodal Personal Identification System Using Palmprint & Palm-vein Images Othaila Chergui 1, Abdallah Meraoumia 2, Hakim Bendjenna 1 and Salim Chitroub 3 1 University of Tebessa, Computer Science Department, Tebessa, Algeria, 2 Univ Ouargla, Fac. des nouvelles technologies de l information et de la communication, Lab. de Génie Électrique, Ouargla , Algeria 3 Signal and Image Processing Laboratory, Electronics and Computer Science Faculty, USTHB. P.O. box 32, El Alia, Bab Ezzouar, 16111, Algiers, Algeria O S Abstract Automatic personal identification plays an important role in the security systems. Nowadays, physiological and behavioral characteristics of a person popularly known as biometrics are increasingly used for security purposes. Recently, two novels hand-based biometric modalities, PaLmPrint (PLP) and PaLm-Vein (PLV), have attracted an increasing amount of attention. In this paper, we propose a new scheme for improving the hand-based identification by combining the results of several classifiers and using the discrete CoNtourlet Transform (2D-CNT) method for feature extraction. In this study, the PLP (PLV) is decomposed into several bands using 2D-CNT at one level. Subsequently, some of resulting bands are used to create the feature vector. Based on this feature vector, two sub-systems, for each modality, can be created. The first one is based directly on this feature vector. Whereas, the second sub-system, use the Hidden Markov Model (HMM) in order to modeling the feature vector. Furthermore, the results of the different classifiers are combined using matching score level fusion strategy. The proposed system is tested and evaluated using a database of 400 users. The obtained experimental results show the effectiveness and reliability of the proposed system, which brings high identification accuracy rate. Index Terms Biometrics, Identification, Palmprint, Palm-vein, Contourlet, HMM, Data fusion. I. INTRODUCTION RRELIABILITY in personal identification is key to the stringent security requirements in both social and industrial environments. Over the last few years, the increasing concern this security has promoted research in different biometric technologies for human identity assessment [1]. As result, a number of biometrics-based technologies have been developed and hand-based person identification is one of these technologies. This technology provides a reliable, low cost and user-friendly viable solution for a many applications which require reliable identification scheme [2]. The human hands contain a wide variety of biometric modalities, like hand geometry, fingerprint, finger-knuckleprint and PaLmPrint (PLP), that can be used by biometric based identification systems. Among these modalities, PLPs are relatively stable and the hand image from which they are extracted can be acquired relatively easily. Therefore, in the past few years, PLP has attracted an increasing amount of attention and it has proven to be a unique biometric identifier [3]. Thus, several studies for PLP-based personal identification have focused on improving the performance of PLP images captured under visible light. However, during the past few years, some researchers have used more features from the palm, such as the veins of the palm to improve the effect of these systems. The veins of the palm mainly refer to the inner vessel structures beneath the skin and the PaLm-Vein (PLV) images can be collected using Near-InfraRed (NIR) light [4]. Obviously, the palm-veins are a low risk of falsification, difficulty of duplicated and stability because they lie under the skin. Moreover, the availability of a device that can acquire PLP and PLV images simultaneously has promoted research to constrict a multimodal system based on the fusion of PLP and PLV modalities in order to overcome some of the limitations imposed by unimodal systems like as insufficient accuracy caused by noisy data acquisition in certain environments [5]. In this work, we propose a new scheme for combining the results of PLP and PLV modalities and using the discrete CoNtourlet Transform (2D-CNT) method for feature extraction. In this study, the PLP (PLV) is decomposed into several bands using 2D-CNT at one level. Subsequently, some of resulting bands are used to create the feature vector. Based on this feature vector, two sub-systems, for each modality, can be created. The first one is based directly on this feature vector. Whereas, the second sub-system, use the Hidden Markov Model (HMM) in order to modeling the compressed feature vector (using Principal component analysis (PCA)). Furthermore, the results of the different sub-systems are combined using matching score level fusion strategy. The rest of the paper is organized as follows: The proposed scheme of the multimodal biometric system is presented in section 2. Section 3 gives a brief description of the region of interest extraction. Feature extraction and modeling process are discussed in section 4. A section 5 is devoted to describe the matching and normalization method. The fusion technique used for fusing the information is detailed in section 6. In section 7, the experimental results, prior to fusion and after fusion, are given and commented. Finally, the conclusions and further works are presented in sections 8. II. MULTIMODAL SYSTEM OVERVIEW The proposed system is composed of two sub-systems exchanging information in matching score level. Each subsystem exploits different feature extraction technique. There 65 IT4OD 2014

72 2 HMM-Modeling Score fusion Decision Database Enrollment Preprocessing Feature extraction Matching Normalization Fig. 1. Multiple algorithms based Palm-vein identification system. Gaussian filter Binary image Boundary Localization Extraction (a) (b) (c) (d) (e) Fig. 2. Various steps in a typical region of interest extraction algorithm. (a) The filtered image, (b) The binary image, (c) The boundaries of the binary image and the points for locating the ROI pattern, (d) The central portion localization, and (e) The preprocessed result (ROI). are two phases in each unimodal system (for example Fig. 1 shows the block-diagram of the proposed unimodal identification system based on the PLV modality): enrollment and identification. Both phases, for the tow sub-systems, comprise two sub-modules: preprocessing for PLV localization and observation vector for features extraction. However, enrollment phase consists, for the second sub-system, of an additional sub-module, HMM modeling, for modelled the observation vector and then store their parameters in the system database. Also, identification phase consists of additional sub-modules, matching, normalization and fusion, for calculating dissimilarity and fused the two normalized matching scores. Finally, based on this unique matching score, a final decision, whether accept or reject a user, is made. III. ROI EXTRACTION FROM PLP & PLV IMAGES In order to localize the Region Of Interest (ROI) area, the first step is to preprocess the palm images; we use the preprocessing technique described in [6] to align both the PLP & PLV images. In this technique, Gaussian smoothing filter is used to smoothen the image before extracting the ROI area. After that, Otsu s thresholding is used for binarized the hand. A boundary-following algorithm is used to extract the hand contour. The tangent of the two stable points, F 1 & F 2, on the hand contour (they are between the forefinger and the middle finger and between the ring finger and the little finger) are computed and used to align the PLP & PLV images. The central part of the image, which is , is then cropped to represent the whole PLP & PLV images. Fig. 2 shows an example of the PLV preprocessing steps. IV. FEATURE EXTRACTION AND MODELING PROCESS In our work, a new method for personal identification using palm modality (PLP or PLV) is presented. Thus, from each representation, we design two sub-systems which employ different templates. In the first sub-system, the ROI subimage is typically analyzed using the 2D-CNT technique [7] at one level. Then, the different decomposition bands are concatenated in order to construct the feature vector V 1 (with size [τ,η], τ = η = 128). For the second sub-system, we aim to reduce the vector V 1 using PCA technique [8]. A new feature vector, V 2 (with size [τ,κ], κ η), also includes most of the useful information from the original vector is obtain. Often, the goal is to compress V 1 in size and to discover compact representations of their variability. After that, an HMM model [9] (M) of the feature vector V 2 is constructed. As result, each palm modality is represented by V 1 and M for respectively the first and second sub-system. Fig. 3 shows the block diagram of the proposed feature extraction methods. V. MATCHING AND NORMALIZATION METHOD Matching process determines the similarity between two given templates/models. The feature vector obtained by the recognition should enter a comparison process. This comparison is made against user template/model, which will be calculated depending on the comparison algorithm used. Thus, during the identification process, the feature vector is extracted using 2D-CNT method. Then the similarity score of the feature vector given each template/model is computed. For the first sub-system, the Sum of Absolute Differences (SAD) is used: τ 1 η 1 d S 1 = V 1t (i, j) V 1r (i, j) (1) i=0 j=0 For the second sub-system, the Log-likelihood score is used: d S 2 = P (V 2t M i )=l(v 2t, M i ) (2) Therefore, the score vector is given by: D =[d S i 1, ds i 2, ds i 3, ds i 4 ds i N ] \ i {1, 2} (3) where N represents the size of the system database. 66 IT4OD 2014

73 3 2D-CNT Concatenation PCA HMM Fig. 3. Block diagram of the proposed feature extraction methods. The matching scores output by the various sub-systems are heterogeneous; score normalization is needed to transform these scores into a common domain, prior to combining them. Thus, a Min-Max normalization scheme was employed to transform the scores computed into similarity scores in the same range. Thus, [ D = α β D min(d) max(d) min(d) where D represent the normalized vector and the two parameters (α,β) are equal to (1,1) and (0,-1) for respectively the first and second sub-system. However, these scores are compared, and the highest score is selected. Therefore, the best matching score is D o and its equal to max( D). VI. FUSION TECHNIQUE Multimodal biometric fusion method is a novel solution to solve the limitations imposed by unimodal systems. Recently, multimodal biometric fusion techniques have attracted increasing attention and interest among researchers, in the hope that the supplementary information between different biometrics might improve the identification process [10]. The matching score level fusion is the most common approach used due to the ease in combining the scores generated by different sub-systems. Thus, during the system design we experiment four different fusion rules: sum-score (SUM), minscore (MIN), max-score (MAX), mul-score (MUL) and sumweighting-score (WHT). VII. EXPERIMENTAL RESULTS A. Experimental database We experiment our method on the multispectral palmprint database from the Hong Kong polytechnic university (PolyU) [11]. The database contains images captured with visible and infrared light. Multispectral palmprint images were collected from 250 volunteers, including 195 males and 55 females. The age distribution is from 20 to 60 years old. It has a total of 6000 images obtained from about 500 different palms. These samples were collected in two separate sessions. In each session, the subject was asked to provide 6 images for each palm. Therefore, 24 images of each illumination from 2 palms were collected from each subject. The average time interval between the first and the second sessions was about 9 days. B. Unimodal systems test results In order to obtain the performance characteristics, the feature vector of unknown PLP (or PLV) is presented to the system and the system will try to match this modality ] (4) with all stored templates (or models) in the database. In our experiment, three images of each palm were randomly selected for enrollment and the other (nine) were taken as the test set. Thus, there are totally 3600 genuine comparisons and impostor comparisons are generated. The goal of this experiment was to evaluate the system performance when we using information from each modalities (PLP and PLV). In addition, for the second sub-system (HMM based system) and in order to find the best parameters of the ergodic HMM model, we choose empirically the number of Gaussian in the Gaussian Mixture Model (GMM) equal to 1 and a number of states of the HMM equal to 4. After Appling the PCA technique and in order to select the best number of PCs vectors for V 2, we conducted several experiments to comparing all the Genuine Acceptance Rate (GAR) for several set of PCs (several V 2 ) and finding the set that gives the best identification rate. Thus, only 16 PCs vectors for the both modality (PLP and PLV) are enough to achieve good representation. 1) PLP based system: This section describes the results of the proposed methods in the case of using PLP modality. Thus, in the case of open set identification scenario, the effectively of the vector based feature representation in comparison with the model based feature representation can be shown in Fig. 4.(a), where the Receiver Operating Characteristic (ROC) curves (False Rejected Rate (FRR) against False Accept Rate (FAR)) and Equal Error Rate (EER) obtained from experimental results are illustrated. From this figure, we were able to conclude that the HMM based feature representation (second sub-system) achieved the best performance, it can achieve an EER equal to % at a threshold T o = Very poor results are obtained when using vector based feature representation (first sub-system), in this case the system work with an EER equal to % at a threshold T o = In the case of closed set identification, the result is presented as a Rank of Perfect Recognition (RPR) and Rank-One Recognition (ROR) in Table 1, which presents the experiments results, obtained for the two feature extraction methods in the case of open and closed set identification system. The best result of ROR is given as % with lowest RPR of 345 in the case of the HMM (model based feature representation). The system can operate at a ROR of % with RPR = 389 in the case of vector based feature representation. It is interesting to note that, the processing time of the second sub-system is very lower than the first sub-system. 2) PLV based system: We have also performed the open set identification performance in the case of PLV modality by applying the two feature extraction methods and drawing the ROC curves as shown in Fig. 4.(b). From this figure, it is clear that our identification system achieves the best EER in 67 IT4OD 2014

74 4 (a) (b) (c) Fig. 4. Performance of unimodal identification systems. (a) ROC curves by different methods for PLP modality, (b) ROC curves by different methods for PLV modality and (c) ROC curves by different modalities. TABLE 1 : UNIMODAL IDENTIFICATION SYSTEMS TEST RESULTS OPEN SET IDENTIFICATION CLOSED SET IDENTIFICATION METHODS Sub-system 1 (Vector V 1) Sub-system 2 (Model M) Sub-system 1 (Vector V 1) Sub-system 2 (Model M) T o EER T o EER ROR RPR ROR RPR PLP PLV the case of vector based feature representation. However, it can operate at EER = % at the threshold T o = Also, we can observe that HMM based feature representation can provide a similar accuracy (EER equal to % at T o equal to ). In other hand, for the evaluation of the system performance in the case of closed set identification scenario, the test results are shown in Table 1 (second line). From these obtained experiments results, it can be seen that always the vector based feature representation offers better results in terms of the ROR ( % and a lowest RPR = 312). Finally, we plot the ROC curves (GAR against FAR), in Fig. 4.(c), for the two best identification systems. It can safely be see the benefits of using the PLV modality. Compared with other approaches in the literature our system achieves better identification results especially in the open set identification case. A very lower error (EER = %) for a database size equal to 400 users proves the superiority of our proposed feature extraction methods. C. Multimodal systems test results Our work is based on two modalities and two feature extraction methods, for that several multimodal systems can be found. Therefore, we considered only three different cases: multi-biometric system (fusion of PLP and PLV), multialgorithmic system (for each modality the fusion of the two sub-systems is performed) and hybrid system (fusion of the two multi-algorithmic systems based on PLP and PLV). These cases were tested to find the case that optimizes the identification system accuracy. The fusion is done at the matching score level by using different rules. 1) multi-biometric systems: The objective of this section is to investigate the integration of the different modalities (PLP and PLV) in order to improve the system performance. In our system, different fusion rules, for both feature extraction methods, were tested to find the rule that optimizes the system accuracy. Thus, to find the better of the all fusion rules, with the lowest EER, Table 2 was generated. In the case of open set identification scenario, the Table shows that the SUM rule with the model based feature representation (M) offers better results in terms of the EER. Thus, we have a minimum EER equal to % at the threshold T o = It is clear that the MUL rule with the vector based feature representation (V 1 ) also improves the result (0.202 % with T o = ). We have also performed the closed set identification scenario by applying all fusion rules on the matching scores obtained from the different PLP and PLV based systems in the two cases V 1 and M and calculated the ROR and RPR. These results (see Table 2) show that generally by combining the two systems, the performance is improved in both ROR and RPR cases. Also, WHT fusion rule and model based feature representation (M) performs better than the other cases and improves the original performance. Thus, the best system can operate with ROR equal to % and RPR = ) Multi-algorithmic systems: In this experiment, the matching is carried out in terms of a set of SAD scores, for the V 1 based system, and their log-likelihood scores for the M based system. Thus, our goal is to matching the input modality (PLP or PLV) against every registered identity to choose the one which gives the best scores. For that, we have developed a prototype PLP (PLV) identification system utilizing multiple algorithmic to boost the system performance. It is based on combining the two feature extraction methods for the purpose of deciding a PLP (PLV) match. In the multialgorithmic system, a user is registered with the system using 68 IT4OD 2014

75 5 TABLE 2 : MULTIMODAL IDENTIFICATION SYSTEMS TEST RESULTS (PLP + PLV) OPEN SET IDENTIFICATION CLOSED SET IDENTIFICATION FUSION RULES Vector V 1 Model M Vector V 1 Model M T o EER T o EER ROR RPR ROR RPR SUM WHT MUL MAX MIN TABLE 3 : MULTIMODAL IDENTIFICATION SYSTEMS TEST RESULTS (V 1 + M) OPEN SET IDENTIFICATION CLOSED SET IDENTIFICATION FUSION RULES PLP PLV PLP PLV T o EER T o EER ROR RPR ROR RPR SUM WHT MUL MAX MIN the different algorithms (V 1 and M) employed, before they can be identified. During identification, the features are extracted using the two methods and the matching is carried out with these extracted features against the reference database. The best scores are chosen in both sub-systems (SAD and loglikelihood), separately. However, the two best scores are fused using one of five fusion rules described above. Finally, this final score is used for decision making to accept or reject the person. The open/closed set identification test, reported in Table 3, is aimed to shows the advantage of using multialgorithmic system, in this table we report the results of the two modalities (PLP and PLV). First, the PLP modality is examined and the open set identification test, reported in Table 3, shows that the SUM rule offers better results with a lower EER equal to % at a T o = In the case of closed set identification system, always SUM rule offers better results than others fusion rules. It can provide a ROR equal to % and a RPR = 321. Therefore, the PLM based multi-algorithmic system can achieve higher accuracy compared with a PLP based unimodal system. Second, we report the experiments where we have using the PLV modality (Table 3). This table shows that the MAX rule offers better results in terms of EER with % at T o = for the open set identification case. The other fusion rules, SUM, WHT, MUL and MIN, done %, %, % and %, respectively. Finally, the closed set identification results (Table 3) shown that, SUM rule give the best identification rate with ROR equal to % with RPR = 176. The WHT rule gives the same ROR but produce a higher RPR. However, this multi-algorithmic scheme of identification improved the identification accuracy considerably. 3) Hybrid systems: In the previous subsections information integration involved looking for complementary information present in a single biometric trait, PLP or PLV, or in a single feature extraction method, V 1 or M. However, to enhance the performance of such identification system, the two best multialgorithmic systems (PLP based on V 1 & M and PLV based on V 1 & M) are fused. To find the better of the fusion rules an experimental result at the EER point is shown in Table 4 (performance of the open/closed set identification system). The open set identification test results show that SUM, WHT and MUL rules based fusion scheme get the best performance with a minimum EER equal to % at the threshold T o equal to , and for respectively SUM, WHT and MUL rules. The MAX rule give an EER equal to % (T o = ). Finally, MIN rule provide an EER (near to the MAX rule) equal two % (T o = ). Compared with other existing multimodal based identification systems, the proposed identification system has achieves better results expressed in terms of the EER. In the case of closed set identification, always, SUM rule based fusion has the best performance. Thus, the best result of ROR is given as % with lowest rank of 132. Also, WHT and MUL rule give a same ROR but with a RPR equal to 150 and 182. A similar result is given by the MIN rule, ROR = % with RPR = 288. Finally, MAX rule provide the poor result (ROR = % and RPR = 110). From these results, the performance of the closed set identification system is significantly improved by using the fusion. 4) Comparative study: Near-infrared image can provide more information than just color, and improves the accuracy of the color. Thus, the near-infrared and visible image fusion has been successfully used for visualization purposes. In general, the information in the near-infrared (PLP) and visible image (PLV)is independent and complimentary. According to some specific fusion schemes, these images are fused in order to construct a more effective biometric system. In order to choose the best multimodal identification system, we have developed a comparative study between the different systems. Thus, for this comparison, a graphical relationship can be established (see Fig. 5.(a)). This figure summarizes the results in terms of the ROC curves. We can observe that, firstly, always the multimodal systems give a lower error and it is able to be 69 IT4OD 2014

76 6 (a) (b) (c) Fig. 5. Performance of multimodal identification systems. (a) ROC curves by multimodal systems, (b) ROC curves for the best system (Hybrid) and (c) CMC curves for the best system (Hybrid). TABLE 4 : MULTMODAL IDENTIFICATION SYSTEMS TEST RESULTS (PLP {V 1 + M} +PLV{V 1 + M}) FUSION RULES OPEN SET IDENTIFICATION CLOSED SET IDENTIFICATION T o EER ROR RPR SUM WHT MUL MAX MIN used in a several applications like access control, surveillance systems and physical buildings. Second, hybrid system gives a perfect performance for open set identification scenario. This, system work with an EER equal to % for a database size equal to 400 users this reflects the number of employees in medium sized companies. Finally, the ROC and the Cumulative Match Curves (CMC) for the best system is shown respectively in Fig. 5.(b) and Fig. 5.(c). VIII. CONCLUSION AND FURTHER WORKS This work describes the design and development of a multimodal biometric personal identification system based on features extracted from PLP and PLV. Furthermore, the unimodal systems suffer from various problems affecting the system performance. These problems are effectively handled by multimodal systems. In this paper, two different subsystems derived from each modality (PLP and PLV) were used. Fusion of the proposed sub-systems is performed at the matching score level to generate a fused matching score which is used for recognizing an image. Feature extraction process use two methods (V 1 and M). The experimental results, obtained on a database of 400 users, shown very high identification accuracy. They also demonstrate that combining different system does significantly perform the accuracy of the system. A comparative study that we have carried out has shown that the performance of the hybrid system is very considerable and promising. For further improvement, our future work will project to use other palm modalities (3D shape & multispectral and hyper spectral palmprint) and other biometric modalities like hand geometry and finger-knuckleprint as well as the use of other fusion level like feature and decision levels. Also we will focus on the performance evaluation in both phases (verification and identification) by using a large size database. REFERENCES [1] Ajay Kumar and David Zhang, Improving Biometric Authentication Performance From the User Quality, IEEE transactions on instrumentation and measurement, vol. 59, no. 3, march [2] Jinrong Cui, Yong Xu, Three dimensional palmprint recognition using linear discriminant analysis method, International Conference on Innovations in Bio-inspired Computing and Applications, pp , [3] Abdallah Meraoumia, Salim Chitroub and Ahmed Bouridane, Palmprint and Finger-Knuckle-Print for efficient person recognition based on Log- Gabor filter response, International Journal Analog Integrated Circuits and Signal Processing, Vol 69 (1), pp:17-27, [4] Jinfeng Yang, Yihua Shi, Jinli Yang, Personal identification based on finger-vein features, Computers in Human Behavior, Vol. 27,pp , [5] Ashok Rao, Mohammad Imran, Raghavendra R, Hemantha Kumar G, Multibiometrics: analysis and robustness of hand vein & palm print combination used for person verification, International Journal of Emerging Trends in Engineering and Technology,Vol. I, No. 1,pp 11-20, [6] D. Zhang, Z. Guo, G. Lu, L. Zhang, and W. Zuo, An online system of multispectral palmprint verification,ieee Trans. Instrum. Meas., vol. 59, no. 2, pp , Feb [7] E.R. Ardabili, K.Maghooli. E.Fatemizadeh, Contourlet Features Extraction and adaboost Classification for Palmprint Verification, Journal of American Science, Vol 7(7), [8] M.S. Bartlett, J.R.Movellan, and T.J. Sejnowski, Face recognition by independent component analysis, IEEE Transactions on Neural Networks, Vol 13, N 6, pp , [9] Harun Uguz, Ahmet Arslan, Ibrahim Turkoglu, A biomedical system based on hidden Markov model for diagnosis of the heart valve diseases, Pattern Recognition Letters 28, pp: , [10] A. Meraoumia, S. Chitroub and A Bouridane, Fusion of Finger- Knuckle-Print and Palmprint for an Efficient Multi-Biometric System of Person Recognition, IEEE International Conference on Communications (ICC), Kyoto, japan, pp. 1-5, [11] The Hong Kong Polytechnic University (PolyU) Multispectral Palmprint Database. Available at: 70 IT4OD 2014

77 Core Points Detection for Touch Based and TouchlessFingerprint Images Salah Ahmed OTHMAN Laboratory of Signals and Images (LSI) University of Science and Technology of Oran, Mohamed Boudiaf Oran, Algeria BOUDGHENE STAMBOULI Tarik Laboratory of Signals and Images (LSI) University of Science and Technology of Oran, Mohamed Boudiaf Oran, Algeria Abstract Singular points detection is most important task for fingerprint image classification operation. Two types of singular points called core and delta point are claimed to be enough to classify the fingerprints images. For touch-less fingerprint recognition system, fingerprint image has some challenging problems as low contrast between the ridges and the valleys pattern, non-uniform lighting, motion blurriness and defocus due to less depth of field of digital camera. All of these problems make detection of core point more difficult compared to images of touch based fingerprint system. In this work we developsimple method to detect core points of fingerprint image acquired bytouch-less system as well as touch based system depend on orientation consistency measure. Keywords-fingerprint; core point; touch-less fingerprint; orientation consistency. I. INTRODUCTION Fingerprints are most widely used human characteristics for people identification. A fingerprint consists of ridges and valleys and each individual has fingerprint which is different from others. An Automatic Fingerprint Identification Systems (AFIS) are used to identify individual depend on the pattern of fingerprint. The AFIS can be touch-based fingerprint system or touch-less fingerprint system and the difference between them is in the sense that the touch-less fingerprint system uses digital camera to acquire the fingerprint image where the touch-based fingerprint system uses live-acquisition techniques. T he AFIS available today are in general touchbased fingerprintsystems but these systems suffer in many problems such as contamination which accurse because the fingerprintplacing of different people over the same interface. This produces a low quality fingerprint image. Another problem is due to contact pressure which creates physical distortions. In general the durability of touch-based fingerprint system is weakened under the pressure of use [1]. For overcoming these problems researchers started to develop touch-less fingerprint systems; They use digital camera in which fingers are not in contact with the sensor and the fingerprint images acquired are distortion free[2].while there are strong advantages of using digital camera, there are drawbacks points. Digital camera images cannot easily be processed due to color content and poor contrast between ridges and valleys in the finger tip. The depth field of camera is small, thus some parts of fingerprint arein the focusand (a) (b) Fig1. (a) Touch based fingerprint image. (b) Touch-less fingerprint image captured by webcam. Core point is indicated by circle and the delta points are indicated by triangle. other out of focus. Unfortunately, there are no much works on the usage of touch-less fingerprint systems; These worksmainly focused on the ridge structureextraction phases [3],[4].and comparison between touch based fingerprint system and touch-less fingerprint system [5].Usually fingerprint includes two types of features at local level that is minutiae (ridges ending and ridges bifurcations) and at global level that aresingular points,core and delta,(see fig 1). The singular points can be viewed as the points where the orientation field is discontinuous. Core points are the points where the innermost ridge loops are at their steepest. Delta points are the points where three patterns deviate.singular points change form image to other depending on the type/class of the fingerprint. For an instance, arch type fingerprint, delta point is absent, while there are other type like whorl, that contain more than one core or delta points. Once detected, singular points can be used to classify fingerprints in order to reduce the search space in the recognition. Numerous studies are being proposed in the literature for singular point detection.[7].used variations of Poincare index for singular point detection. Bazen and Gerez [8] used directional field with Poincare index to detect singular points. Poincare method is reliable for good images but gives spurious singular points in noisy images Zheng et al. [9] used curvature-based method to detect singular points. Geevar et al[6] used orientation consistency estimation of square gradient image to detect singular points of fingerprint 71 IT4OD 2014

78 images.orientationconsistency estimation is fined de as how well orientations over a neighborhood are consistent with the dominant orientation. Geevar et al[6] considered the core and delta points as a region in fingerprint where ridge orientation have local minimum orientation consistency.while the core points block (center of image) contains the most of ridges characteristics (ridges ending and ridges bifurcation) and low cost camera is used that offers relatively highresolution(640 x 480) compared with touch-based fingerprint sensor but with small depth; that makes delta points don t appear very well. We propose the orientation consistency estimation method to detect the core points of touch-less and touch basedfingerprint system image.for touch-less fingerprint system simple webcam (Intex IT-308WC) is used and three conditions before acquisition must be checked: The focus of the lenses of webcam must be fixed according to distance; Fingerprint must be vertical in image that allow to give unique orientation of fingertip; Illumination of webcam and the surrounding environment must be unique. The rest of the paperis organized as follows. Section 2 gives the normalization and segmentation of fingerprint image. Section 3 studies the ridges orientation estimation and orientation consistency measure. Section 4gives the steps of core point detection. Section 5 gives the experimental results. And conclusions are given in section 6. II. NORMALIZATION AND SEGMENTATION A. Normalization Normalization is the first preprocessing operation; it is done to minimize the non-uniform lighting problem. The main purpose of normalization is to reduce the variations in grayscale level values along ridges and valleys; which facilitates the subsequent processing input fingerprint image is normalized so that it h as a pre-specified mean and variance.normalization is done in grayscale level. Touch based system gives fingerprint image in grayscale level so we can directly normalize the fingerprint image. For fingerprint image captured by touch-less fingerprint system conversion from RGB to grayscale level is done. The grayscale image is then given as an input to the normalization block where the image can be normalized. Let I(i, j) denote the gray-level value at pixel (i, j), M and V denote the mean and variance of I, respectively, and N(i, j) denote the normalized gray-level value at pixel (i, j).the normalized image is defined as follows: MM 0 + (VV 0.II(ii,jj ) MM ii ) 2, iiii II(ii, jj) > MM VV NN(ii, jj) = (1) MM 0 (VV 0.II(ii,jj ) MM ii ) 2, ooooheeeeeeeeeeee, VV Where M 0 and V 0 are the desired mean and variance values respectively. Normalization is a pixel-wise operation; it does not change the clarity of the ridge and valley structures. Fig2. Fingertip skin color based segmentation (two first columns).adaptive threshold (third column).the input images (top) andthe output images (bottom). B. Segmentation In parallel with the normalization process fingerprint segmentation is also going on. Segmentation is the process of separating the foreground regions in the image from the background regions. The foreground regions correspond to the clear fingerprint area containing the ridges and valleys, which is the RegionOfInterest (ROI). Segmentation step is important in determining the accuracy of orientation, since the noises in the background image may give false orientation and lead to false detection of core this work, segmentation for image captured by touch based fingerprint system is done by binarizing the image with an adaptive threshold value and morphological operators. Sincethe segmentation of image captured by touchlessfingerprint system is done by using skin color detection, with morphological filters, such as the ho les filling operator and the erosion operator.first image is converted from RGB color space to YCbCr color space. In this color space skin is presented by a particular portion of the space [11]. Segmentation of skin color can be limited to a segmentation of the chrominance components Cb and Cr, while segmentation was processed by using an upper and lower bound for each channel after regularization has been applied using two simple morphological filters: first the larger connected object (the fingertip) is selected and the holes inside it are filled, then, all the remaining smaller objects are removed. The final step is an erosion operation performed by using a structured element (asquare of 1/40 of the image width). This last step is required to unselect the borders of the fingertip where typically the ridges are not correctly acquired (see fig 2). 72 IT4OD 2014

79 III. RIDGE ORIENTATION ESTIMATION AND ORIENTAION CONSISTENCY MEASURE A. Ridge orientation estimation The ridges orientation is then estimated using the least square method that averages the squared gradients proposed by Jain[10] This method estimates the orientation of each block rather than pixel basis. The steps for calculating the orientation at pixel (ii, jj) for touch based and touch-less fingerprint imagesare as follows: 1. Divide the segmented image into non-overlapping block size W x W and center it at pixel(i, j). The size of w should not be big to overlap different ridges and not too small to miss the ridge details. 2. For each pixel(i, j) compute the gradients x (i, j) and y (i, j) which are gradient magnitudes along x (horizontal) and y (vertical) directions. Two sobel filters are used for this. 3. Compute the local orientation of pixel (i, j) using the following equation x = G(u, v) x (i uw, j vw), (8) (u,v) w y = G(u, v) y (i uw, j vw), (9) (u,v) w where G is a Gaussian low-pass filter of size ww ww. 5. Compute the locale ridge orientation at pixel (i, j) using : OO(ii, jj) = 1 2 tttttt 1 yy (ii, jj) xx (ii, jj). (10) 2 G xx = x (u,v) W (u, v) (2) 2 G yy = y (u,v) W G xy = x (u, v) y (u, v) (u,v) W (u, v) (3) 2G xy (4) θ(i, j) = 1 2 tan 1, (5) G xx G yy whereθ(i, j) is the least square estimate of the local orientation at the block centered at pixel (i, j). 4. Due to the presence of noise, corrupted ridge and valley structures, minutiae etc. in the input image, the estimated local ridge orientation, θ(i, j), may not always be correct. A low-pass filter is hence used to modify the incorrect local ridge orientation, to apply a low pass filter the orientation image must be converted to a continues vector field by: xx (ii, jj) = cos 2θθ(ii, jj), (6) yy (ii, jj) = sin 2θθ(ii, jj), (7) where x and y are the x and y components of the vector field, respectively. After the vector field has been computed, Gaussian smoothing is performed as follows: Fig3. Touch-less and touch based fingerprint images with its orientation map B. Orientation Consistency Estimation measure: After finding the ridges orientation map (see Fig 3), next step is to calculate the orientation consistency. Orientation consistency of the intensity gradient image can be used to accurately locate the singular points. In intensity gradient vector image, G(i, j) is estimated by differentiation from input imagef(i, j). G(i, j) = G zz (i, j)e jθ z (i,j) (11) whereg z (i, j) is the gradient amplitude and is defined as G i (i, j) = G x2 + G y 2 (12) where G x, G Y are gradients of horizontal, vertical directions and θ represents the orientation and are computed by f(i, j) f(i, j) G x (i, j) =, G y (i, j) = i j θ(i, j) = tan 1G y G x (13) 73 IT4OD 2014

80 Our propose of core points as a region in fingerprint where ridge orientations have local minimum orientation consistency.many methods have been proposed in the literature as coherence, reliability, and consistency. The coherence and reliability of the square gradient is defined as a measure of how well the orientations over a neighborhood are pointing in the same direction.since we are only interested in core point detection for this work, and for more accurate measurement, we use the squared gradient image with orientation image for calculating the orientation consistency[12]. Orientation consistency (CE) is fined de as how well orientations over a neighborhood areconsistent with the dominant orientation and can be expressed as CE = ( N G i(i, j)cos2o(i, j)) 2 + ( N G i (i, j)sin2o(i, j)) 2 N G i (i, j) (14) whereg i is squared gradient image computed using equ (12) and NNis a window of size N x N in a smoothed orientation 0(i, j).if all the orientations in N are in the same direction then CE will be high (CE = 1), while if the orientations are all in different directions then CE will be very low ( CE = 0). Thus, the value of orientation consistency measure will range from 0 to 1.Then a value of CE which is less than a threshold T c can be chosen to accurately locate the core points in the input image. It would be better, if it is in the range0.1 < TTTT < 0.5. IV. CORE POINT DETECTION The proposed core points detection in a fingerprint image has the following processing steps: 1) Normalization of input image by using equ(1), the main propose of normalization is to reduce the variation in gray-scale level; 2) In parallel with normalization, fingerprint segmentation is also going on to separate the background image from the foreground, that s mean to obtain the binary mask of fingertip; we use in this work skin color detection methodwith two morphological filters for webcam fingerprint images and adaptive threshold value with morphological operators for touch based fingerprint images. 3) Once obtained the binary mask, it is used to remove the normalized imageforeground then the ridges orientation estimation is going on by usingequs(2 10). 4) From orientation image equ(10) and square gradient image equ(12) compute the orientation consistency usingequ(14),by taking each block of N N neighborhood size. The value of CE will be in the range of 0 to 1. Search within fingertip for block(s) with minimum local consistency (see Fig 4). 5) Find blocks with minimum local consistency by CCCC < TT CC. multiple blocks will be detected; they will be core point, detla point or spurious point. Localization of core points is done by scanning the threshold image using a window of sizen N. For distinguish core point from delta point and spurious points the coordinates of fingertip are used, the values of minimum locale consistency that will be nearthe center of fingertip are selected as core points. V. EXPERIMENTAL RESULTS In this work, we have used window of 11x11 and 5x5 for calculation of the orientation field from touch-less fingerprint image and touch based fingerprint image respectively and window 5x5 and 3x3 for finding the orientation consistency measure field. The proposed core point detection method has been tested with: First: a sample set of touch based fingerprint images taken form Neurotechnologija web site [13] named Verifinger_ Sample_ DB. This database contains 360 different images of 45 individuals.120 sample images have been taken for testing including 9 arch type fingerprints. Second: a sample set of touch-less fingerprint images acquired by simple webcam (Intex IT-308WC) for 20 persons and 6 different impression of the same finger. 74 IT4OD 2014

81 120 sample images have been tested including 6 arch type fingerprints. First true core points for each fingerprint image has been marked manually. For touch based database from 139 core points, 129 core points have been correctly identified (92.8% accuracy) and for touch-less database from132 core points, 119 core points have been correctly identified (90.15% accuracy). Table (1) shows the summary of tested results. Efficiency of proposed method depends on the acquisition conditions of fingerprint i.e. the quality of fingerprint images. Fig 5 shows examples of core points detection by CE. TABLE I SUMMARY OF TEST RESULTS Images tested Total Number of fingerprints taken for testing Total number of arch type images Total number of core points manually marked in fingerprints (excluding arch type) Number of correctly identified core points using CE Number of spurious core points detected (excluding arch type) Touch-less images Touch based images Fig5. Some results of core points detection by CE: tow first columns touch-less fingerprint images VI. CONCLUSION and third columns for touch based fingerprint images. 10 In this paper, we have presented a simple method for detection the core points of touch-lessand touch based fingerprint images by using orientation consistency measure.sincethe depth of webcam is small and the zone around core points contains the most of ridges characteristics, we used consistency of the orientation field to detect core points. The consistency of the orientation field isminimum in singular regions. Thismethod will find the localminimum orientation consistency to detect the core points.experiment results show that the proposed method is capableof detecting core points in a fingerprint touch-less image as well as touch based fingerprint images.higher accuracy can beobtained by improving acquisition conditions of fingerprint image and the orientation smothering algorithmto avoid spurious detection of core points.our future work will focus onclassifying the touch-less fingerprint images depend on the small region around core points so that thecomputational time for matching fingerprints in a hugedatabase may reduce. References : [1] M. Kokou, Amine Nait Ali, Fingerprint Characteristic Extraction by Ridge Orientation: An Approach for a Supervised Contactless Biometric System International Journal of Computer Applications ( ) Volume 16 No.6, February [2] B.Y. Hiew, Andrew B.J. Teoh, and David C.L. Ngo, Preprocessing of Fingerprint Images Captured with a Digital Camera", 9th International Conference on Control, Automation, Robotics and Vision (ICARCV 2006), Singapore, [3] Lee C., Preprocessing of a Fingerprint Image Captured with a Mobile Camera. ICB 2006, LNCS 3832, pp [4] D. Lee, W. Jang, D. Park, S. Kim, J. Kim, A Real-Time Image Selection Algorithm: Fingerprint Recognition Using Mobile Devices with Embedded Camera, AutoID, 2005, pp [5] Prabhjot Kaur, Ankit Jain,and Sonia Mittal, Touch-less Fingerprint Analysis A Review and Comparison, I.J. Intelligent Systems and Applications, 2012, 6, [6] Geevar C Zacharias and P Sojan lal «Singularity Detection in Fingerprint Image using Orientation Consistency pro. Of IEEE /13/$ [7] K. Karu and A. K. Jain, Fingerprint fication, classipattern Recognition,vol. 29, no. 3, pp , [8] A. M. Bazen,, Systematic methods for the computation of the directional fields and singular points of fingerprints, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 7, pp ,2002. [9] X. Zheng,, A detection of singular points fingerprint images combining curvature and orientation field, in Proc.Intl Conf. Intelligent Computation, 2006, pp [10] A. K. Jain, S. Prabhakar, L. Hong, and S. Pankanti, Filterbank-based fingerprint matching, IEEE Trans. Image Processing, vol. 9, no. 5, pp , [11] M. ABDALLAH,, Different Techniques of Hand Segmentation in the Real Time, International Journal of Computer Applications & Information Technology Vol. II, Issue I, January 2013 (ISSN: ). [12] D. Luo, Q. Tao, J. Long, and X. Wu, Orientaion consistency based feature extraction for fingerprint identification, in TENCON 02 Pro - ceedings IEEE., vol. 1, 2002, pp [13] Neurotechnology. [Online]. Available: com/download.html 75 IT4OD 2014

82 A hierarchical intrusion detection system based on combining predictions of different classifiers Ahmed AhmimNaciraGhoualmi-Zine Laboratory of Computer Networks and Laboratory of Computer Networks and Systems, Department of Computer Science Systems, Department of Computer Science BadjiMokhtar-Annaba UniversityBadjiMokhtar-Annaba University Abstract Nowadays, the cyber-attacks become a crucial problem. The intrusion detection system (IDS) is one of the most important weapons against these cyber-attacks. The simple one level IDS cannot combines a high true positive rate of frequent attacks, low false alarm rate, and an acceptable true positive rate of low frequent attacks. To deal with these limits, we propose a new hierarchical intrusion detection system. The proposed model includes two levels. The first one contains five classifiers Repeated Incremental Pruning to Produce Error Reduction, Multilayer Perceptrons, Self-Organizing Feature Map Network, C4.5 Decision Tree and Naïve Bayes used for their high rate of correct classification of respectively DOS, Normal behavior, Probe, R2L and U2R. Only five predictions of the first level are selected, and used as inputs of the second level that contains RBF network as final classifier. The experimentation on KDD99 shows that our model gives the highest accuracy and the highest detection rate compared to the results obtained by some well-known IDS models. Index Terms Computer Security, Intrusion Detection System, IDS, Hierarchical IDS, Hybrid IDS. I. INTRODUCTION Actually, the computer networks are widely used to perform various activities such as shopping, education, communication, etc. The security of computer networks is a crucial problem due to the importance and sensitivity of thecommunicatedinformation. The cyber-attack represents one of the most dangerous secret weapons. Computer security rallies methods, techniques and tools used to protect systems, data, and services against the accidental or intentional threats so as to ensure confidentiality, availability, and integrity [1]. Nowadays, different techniques and methods are developed to implement the security policy such as authentication, cryptography, firewalls, proxies, antivirus, Virtual Private Network (VPN), and Intrusion Detection System (IDS). An IDS is either software or hardware system that automates the monitoring process of events occurring in computer systems or networks, by analyzing them to notify the probable security problems [2]. We can classify the intrusion detection system into two categories: anomaly detection and misuse detection [3]. I f an intrusion detection system uses information about the normal behavior of the system which it monitors, we qualify it as anomaly detection. If an intrusion detection system uses information about attacks, we qualify it as misuse detection [4]. Different methods and techniques are used in intrusion detection. In the early stage, the artificial intelligence and learning machine are used. But, face problems like huge network traffic volumes, highly imbalanced data distribution, and difficulty to give decision limits between normal and abnormal behavior and a r equirement for continuous adaptation to a constantly changing environment these techniques have shown limitations [5]. The data mining techniques are used to deal with these limitations, where various data mining techniques are used such as Fuzzy Logic [6], Naïve Bayes [7], RIPPER [8], Decision Trees [9], Support Vector Machines [10], Artificial Neural Networks [11]. Although the data mining technique offers many advantages for intrusion detection, but this simple one level IDS has some limits like the very low detection rate for the low frequent attacks, the high false alarm rate, the inability to combine high detection rate and low false alarm rate. To skip this limit the hierarchical and hybrid IDS are increasingly used in the last decade [12]. The hierarchical and hybrid IDSs use the combination of different classifiers on different level in order to increase the detection rate and decrease the false alarm rate. Actually, there are several hierarchical and hybrid models like Hybrid flexible Neural Tree based [13], Hybrid Machine Learning [14], Hierarchical neural network model [15], hierarchical clustering and support vector machines [16]. This paper is organized as follows. Section 2 presents the related works. We outline in section 3 the general structure and the operation mode of our model. The experiments are discussed in section 4. Finally, the section 5 draws the conclusion. II. RELATEDWORKS The main drawbacks of the simple one level IDS are: the low detection rate of the low frequent attack, the high false alarm rate, the inability to combines high detection rate and low false alarm deal with the above inconveniences, and enhance the performance of IDS, some hierarchical and hybrid models based on different 76 IT4OD 2014

83 type of classifier are developed such as: IDS based on hierarchical clustering and support vector machines [16], IDS based on an evolutionary soft computing model using neuro-fuzzy classifiers [17], multiple-level hybrid classifier for IDS using Bayesian clustering and decision trees [18], IDS based hybrid RBF/Elman neural networks [19], FC-ANN [20]. Toosi and Kahani (2007) [17] proposed hierarchical IDS based neuro-fuzzy networks, fuzzy inference approach and genetic algorithms. At first a set of parallel neuro-fuzzy classifiers are used to perform the initial classification. Then, the fuzzy inference system based on the outputs of neuro-fuzzy makes the final decision of classification, where the genetic algorithm optimizes the structure of the fuzzy decision engine. Xiang et al. (2008) [18] developed a multiple-level hybrid IDS based supervised tree classifiers and unsupervised Bayesian clustering. This model has four stages of classification similar to the well-known threelevel tree classifier [21]. Tong et al. (2009) [19] proposed a hybrid IDS based RBF/Elman neural networks. The RBF neural network is employed as a real-time pattern classification, and Elman neural network is employed to restore the memory of past events. FC-ANN [20] is hierarchical IDS based neural network and fuzzy clustering. It's composed of three layers: the first layer is a fuzzy clustering that generates the different training subsets. T he second layer represents the different neural networks trained to formulate the different base models. The last layer is a f uzzy aggregation module which is employed to aggregate these results and reduce the detected errors. Horng et al. (2011) [16] proposed an IDS that combines a hierarchical clustering algorithm, a simple feature selection procedure, and the SVM technique. At first the hierarchical clustering algorithm is used to generate training instances. Then, the simple feature selection procedure was applied to eliminate unimportant features from the training set. Finally, the obtained SVM model classifies the network traffic data. The related works have addressed some limits of the simple one level IDS like the combination of high detection rate and low false alarm, give a better detection rate for the low frequent attacks. Despite all these advantages, these related works are unable to increase the detection rate of the low frequent attacks without decrease the detection rate of the frequent attacks or increase the false alarm rate. III. OUR MODEL Our work aims to build a n ew hierarchical IDS that better detect the low-frequent attacks without losing their high detection rate on the detection of the frequent attacks or their low false alarm rate. In this section, we present the different components of the new hierarchical IDS and their usefulness. As shown in figure 1 our model is composed of two levels. The first level: this level contains different types of classifier. These latter are selected for their highest detection rate for one or more classes of network connection. The process of selecting the best classifier of the first level is aim to benefit from these strong points of classifiers in order to increase the detection rate for all categories of network connections, mainly for the low frequent attacks like U2R attacks. Moreover, this choice of the best classifiers provides a maximum of correct estimation for each category of network connection, which facilitates the task of the second level to take the right final decision of classification. As illustrated in Figure 1, each classifier gives five predictions relative to the four categories of attacks and normal behavior. We maintain only the prediction of classes for which the classifiers are selected. We eliminate the other predictions because they represent nonuseful information that can negatively influence the learning step of the classifier of the second level. The five selected predictions are used as inputs of the second level. The second level: this level contains the final classifier that decides if the connection is normal behavior or an attack, where it uses the selected outputs of the first level as inputs. The transformation of the different data set features to only five features with direct relation to the two classes of network connection increases the accuracy of the model. Fig. 1. General structure of our model 77 IT4OD 2014

84 A. The operation mode of our model The operating mode of our model is composed of three stages: select the different classifiers of the first level and second level, training stage and testing stage. 1) Select the different classifiers of the first and second level In the aim to select the best classifiers of the first and the second level, we perform a comparative study between different types of classifiers. For the first level, the different classifiers are compared relative to their performance on the classification of the network connection in one of the five classes of network connections (DOS, Probe, R2L, U2R, and normal behavior). We select only five classifiers those give a good accuracy, and the highest rate of correct classification for at least one of the five classes of network connections. For the second level, we select the classifier that gives the highest accuracy on the classification of network connection in one of the two the classes (Normal, Attack). To perform the second comparative study, we use the selected outputs of the first level as inputs of the second level. 2) Training stage This stage represents the training of our model with the aim to prepare it for the test stage. This stage is composed of two steps: Train the first level: we train the different classifiers of the first level with the training data set, where each feature of the data set represents an input for the classifiers of this level. Train the second level: a new data set is created from the predictions of the selected classifiers of the first level. To generate this new training data set, we associate the selected predictions result with the correct label as in the following table 1. The new training data set is used to train the classifier of the second level. DOS Prediction TABLE I. THE NEW TRAINING DATA SET Probe prediction U2R prediction R2L prediction Normal prediction Label Attack Normal Attack Attack Attack 3) Test stage In this stage, we test the performance of our model after the achievement of the training stage, where we use the test data set. To perform this stage, we process in parallel each record of the data test by the different classifier of the first level. Then, we use the selected prediction outputs of the different classifiers of the first level as an input of the classifier of the second level. Finally, the classifier of the second level makes the final decision, where each record is classified as an attack or a normal behavior. IV. EXPERIMENTS This section is divided into three parts. In the first one, we detail the training and test data set. The second part represents a comparative study between eight classifiers in the aim to select the best ones for the first and the second level. Whereas, the third part represents the comparative study between our model and the related works. We have performed a set of experience on KDD99 [22] which represents the most used data set for intrusion detection in the last decade [12], [5]. NeuroSolutions Software [23] is used for the implementation of neural networks, and Weka Data Mining Tools [24] is used for the implementation of other classifiers. The combination of NeuroSolutions and Weka Data Mining is used to implement our model. The results are obtained on a Windows PC with Core 2 Duo 2.0 GHz CPU and 2 GB RAM. A. Training and Test Data Set The KDD99 is derived in 1999 from DARPA-Lincoln98 [25] dataset that is collected by MIT s Lincoln laboratory. KDD99 contains 39 type of attacks and normal behavior classified into five classes: DOS attack, U2R attack, PROB attack, R2L attack and Normal behavior [22]. Table 2 summarizes the distribution of attack and normal behavior records of our training data set. The KDD99 Test data set is used to evaluate the performance of our models. TABLE II. DISTRIBUTION OF ATTACK AND NORMAL BEHAVIOR OF OUR TRAINING DATA SET Type of connection Number of Record Proportion Normal 4,500 22,50% DOS 12,319 61,60% Probe 2,130 10,65% R2L 999 5,00% U2R 52 0,26% Two features (num_outbound_cmds and is_host_login) are removed because of their identical values in the training data set. To normalize data sets, the ASCII encoding method is used to convert the symbolic data to numerical values. Then, each data x i of the feature j is normalized based on the follows equation:xx ii (jj) = xx ii (jj ) mmmmmm (xx(jj )) mmmmmm (xx(jj )) mmmmmm (xx(jj )) (1) B. Comparative studies of classifiers In the aim to select the best classifiers for the first and second level of our model, we have performed comparative studies. The first comparative study aims to select the classifiers those give the highest true positive 78 IT4OD 2014

85 rate for DOS, Probe, U2R, R2L and the highest true negative rate for Normal behavior. The second comparative study aims to select the classifier that gives the highest accuracy for the second level. The different classifiers compared are: Naïve Bayes (NB) [27], C4.5 Decision Tree (DT) [28], Support Vector Machine (SVM) [29], Repeated Incremental Pruning to Produce Error Reduction (RIPPER) [30], Multilayer Perceptrons (MLP) [31], Self-Organizing Feature Map Network (SOFM) [32], Radial Basis Function Neural Network (RBF) [31], and Recurrent Neural Networks (RNN) [33]. In these comparative studies, we have used the training data set detailed in above Table 2. In these comparative studies, we have used NeuroSolutions Software [23] for the implementation of MLP, SOFM, RBF, RNN and Weka Data Mining Tools [24] for the implementation of SVM, NB, DT, RIPPER. In the first level, each classifier has 39 inputs that represent the 41 features of KDD9 without num_outbound_cmds and is_host_login. For neural networks, like Bagdad [26] each one has three hidden layers contain respectively 38, 19, and 12 neurons and an output layer with 5 neurons that represent the four categories of attacks (DOS, PROBE, U2L, R2L) and normal behavior. For the other classifiers each one gives their prediction for the four categories of attacks (DOS, PROBE, U2L, R2L) and normal behavior. In the second level, each classifier has 5 inputs that represent the five selected predictions of the classifiers of the first level. F or neural networks, each one has two hidden layers contain respectively 15 and 7 neurons and an output layer with 2 neurons that represent Attack and Normal behavior. For the other classifiers each one gives their two predictions for Attack and Normal behavior. Table 3 summarizes the result of correct classification of the eight classifiers. As illustrated in Figure 2, the best classifier in the detection of Normal behavior, DOS, R2L, Probe, U2R are respectively MLP with TNR equal to %, RIPPER with DR equal to 97.49%, DT with DR equal to 28.23%, SOFM with DR equal to 97.10%, NB with DR equal to 22.81%. TABLE III. THE CORRECT CLASSIFICATION OF THE EIGHT CLASSIFIERS OF THE FIRST LEVEL DOS Normal Probe R2L U2R DT 97.05% 86.59% 92.05% 28.23% 8.77% SVM 97.04% 96.53% 84.47% 9.81% 8.77% NB 96.33% 94.17% 89.53% 0.78% 22.81% RIPPER 97.92% 97.34% 84.21% 13.95% 10.53% Correct classification rate 100,00% 90,00% 80,00% 70,00% 60,00% 50,00% 40,00% 30,00% 20,00% 10,00% 0,00% Fig. 2. The performance of the different classifiers for the first level For the second level, we select RBF Network that gives the highest accuracy compared to other classifiers. C. Evaluation of the new hierarchical IDS After analysis the performance of the different types of classifiers, we exploited their strong points in order to achieve our goal. The practical structure of our new model contain in the first level RIPPER, SOFM, DT and NB for their highest true positive rate in the detection of respectively DOS, Probe, R2L and U2R. Moreover, MLP is selected for their highest true negative rate in the detection of normal behavior. In the second level, we use RBF Network as a final classifier in order to combine the predictions of the different classifiers and give the final decision. To build our model, we have used Weka Data Mining Tools [24] to train and test the different classifiers, NeuroSolutions Software [23] to train and test the different neural networks, Java SE Development Kit 7 [34] and MySQL database [35] to process the data set and combine the different outputs of the first level as inputs of the second level. To evaluate the performance of our new model, we have compared its performance with the performance of the related works that used all KDD99 Test data set [22] for the evaluation of the performance of their model. For our model, we have used the Training Data Set detailed in above Table 2 as a training data set, and all KDD99 Data Test [22] as test data set. The result of the comparison is shown in Table 4. DT SVM NB RIPPER MLP RBF SOFM RNN MLP 97.49% 98.99% 83.20% 7.80% 0.00% RBF 96.31% 95.14% 77.96% 3.45% 0.00% SOFM 96.65% 95.17% 97.10% 1.93% 0.00% RNN 97.59% 97.22% 84.61% 8.21% 0.00% 79 IT4OD 2014

86 TABLE IV. COMPARISON BETWEEN OUR MODEL AND RELATED WORKS FC- Our Tossi Xiang Horng ANN model [17] [18] [16] [20] Normal 98.85% 98.20% 99.08% 96,80% 99.30% DOS 98.68% 99.50% 96.70% 98.66% 99.50% Probe 96.88% 84.10% 80.00% 93.40% 97.50% R2L 42.77% 31.50% 58.57% 46.97% 28.80% U2R 67.11% 14.10% 76.92% 71.43% 19.70% DR 95.0% 94.77% 93.94% 93.93% 94.82% FAR 1.15% 1.90% 0.92% 3.20% 0.70% Accuracy 95.76% 95.30% 94.94% 94.49% 95.70% As shown in figure 3, the best model in the detection of Normal behaviors is Horng [16] model with 99.30%, DOS attacks is Horng [16] model with 99.50%, Probe attacks is Horng [16] model with 97.50%, R2L attacks is FC-ANN [20] model with 58.57%, and U2R attacks is FC-ANN [20] model with 76.92%. Our model doesn't give the highest rate of correct classification for none category of network connection, but it is close to the highest rate of correct classification for most of them, where it gives 98.85% for Normal, 98.68% for DOS, 96.88% for Probe, 42.77% for R2L and 67.11% for U2R. The time required to test all KDD99 Test data set records is second, so the average time required to process one record is Microsecond. That proves the speed of our model to process the network traffic. Globally, our model gives the highest detection, the highest accuracy and a l ow false positive rate that represents the third least false alarm rate. The improvement of the accuracy represents 186 records correctly classified more than the best model of the related works. Moreover, our model shows its ability to better detect the low frequent attacks as U2R and R2L without decrease the detection rate of frequent attacks or increase the false alarm rate, which represents a great advantage. The comparative study between our model and other models shows that our model has achieved the objectives of our paper, where it gives the highest accuracy with 95.76%, it combines the high detection rate and a low false alarm rate, and it better detects the low-frequent attacks without decrease the detection rate of frequent attacks or increase the false alarm rate. 100,00% 90,00% 80,00% 70,00% 60,00% 50,00% 40,00% 30,00% 20,00% 10,00% 0,00% Normal DOS Probe R2L U2R DR FAR Accuracy Our model Tossi [17] Wanga [20] Xiang [18] Horng [16] Fig. 3. Comparison between our model and related works V. CONCLUSION In this article, we have proposed a new IDS model based on the combination of different classifiers that meets the following requirements: better detect the low frequents attacks as U2R, give a high true positive rate for frequent attacks and give a l ow false alarm rate. Our model includes two levels. The first one contains five classifiers Repeated Incremental Pruning to Produce Error Reduction, Multilayer Perceptrons, Self-Organizing Feature Map Network, C4.5 Decision Tree and Naïve Bayes used for their high rate of correct classification of respectively DOS, Normal behavior, Probe, R2L and U2R. Only five predictions of the first level are selected, and used as inputs of the second level that contains RBF network as final classifier. The experiments show that our model gives the highest detection rate, the highest accuracy and a l ow false alarm rate compared to some well-known IDS models. Furthermore, our model has shown its ability to better detect the low-frequent attacks without decrease the detection rate of frequent attacks or increase the false alarm rate. VI. REFERENCE [1] E. Cole, R. Krutz and J. Conley, Network Security Bible, Wiley Publishing, Inc, ISBN13: , [2] K. Scarfone and P. Mell, Guide to Intrusion Detection and Prevention Systems (IDPS), NIST Special Publication, Gaithersburg, pp , [3] S. Axelsson, Intrusion detection systems: A survey and taxonomy, Technical Report 99-15, Chalmers University, Goteborg, March 2000, [4] H. Debar, M. Dacier and A. Wespi, A Revised Taxonomy for Intrusion Detection Systems, Annals of Telecommunications, Vol. 55, No. 7-8, pp , IT4OD 2014

87 [5] S. W. Xiaonan and W. Banzhaf, The use of computational intelligence in intrusion detection systems: A review, Applied Soft Computing, Vol. 10, No.1, PP 1 35, [6] W. Chimphlee, H. A. Addullah and M. N. M. Sap, Srinoy, S., Chimphlee, S., Anomaly-based intrusion detection using fuzzy rough clustering. P roceeding ICHIT '06 Proceedings of the 2006 International Conference on Hybrid Information Technology, IEEE Computer Society Washington, Vol. 01, pp , [7] S. L. Scott, A Bayesian paradigm for designing intrusion detection systems, Computational Statistics and Data Analysis, Vol. 45, No. 1, pp , [8] W. Fan, M. Miller, S. Stolfo, W. Lee and P. Chan, Using artificial anomalies to detect unknown and known network intrusions, Knowledge and Information Systems, Vol.6, No.5, pp , [9] S. Paek, Y. Oh and D. Lee, " sidmg: Small-Size Intrusion Detection Model Generation of Complimenting Decision Tree Classification Algorithm",7th International Workshop, WISA 2006, Jeju Island, Korea, August 28-30, Springer Berlin Heidelberg, pp 83-99, [10] Z. Zhang and H. Shen, "Application of online-training SVMs for real-time intrusion detection with different considerations", Computer Communications, Vol.28, No.12, pp , [11] J. Cannady, Artificial neural networks for misuse detection, in Proceedings of the 21st National Information Systems Security Conference, Arlington, VA, USA, pp , [12] C. Tsaia, Y. Hsub, C. Linc and W. Lin, Intrusion detection by machine learning: A review, Expert Systems with Applications, Vol. 36, No. 10, pp , [13] Y. Chen, A. Abraham and B. Yang, Hybrid flexible neural-tree-based intrusion detection systems, International Journal of Intelligent Systems, Vol. 22, No.4. pp , [14] T. Shon and J. Moon, A hybrid machine learning approach to network anomaly detection, Information Sciences: an International Journal, Vol. 177, No. 18, pp , [15] C. Zhang, J. Jiang, M. Kamel, Intrusion detection using hierarchical neural networks, Pattern Recognition Letters, Vol. 26, No. 6, pp , [16] S. Horng, M. Su, Y. Chen, T. Kao, R. Chen, J. Lai and D. C. Perkasa, A novel intrusion detection system based on hierarchical clustering and support vector machines, Expert Systems with Applications, Vol.38, No.1, pp , [17] N. A. Toosi and M. Kahani, A new approach to intrusion detection based on an evolutionary soft computing model using neuro-fuzzy classifiers, Computer Communications,Vol.30, No.10, pp , [18] C. Xiang, C. P. Yong and S. L. Meng, Design of multiplelevel hybrid classifier for intrusion detection system using Bayesian clustering and decision trees, Pattern Recognition Letters, Vol. 29, No. 7, PP , [19] X. Tong, Z. Wang and H. Yu, "A research using hybrid RBF/Elman neural networks for intrusion detection system secure model", Computer Physics Communications, Vol.180, No.10, pp , [20] G. Wanga, J. Hao, J. Mab and L. Huanga, A new approach to intrusion detection using artificial neural networks and fuzzy clustering, Expert Systems with Applications, Vol. 37, No. 9, pp , [21] C. Xiang, Y. M. Chong and L.H. Zhu, Design of multiplelevel tree classifiers for intrusion detection system. In: Proc IEEE Conf. on C ybernetics and Intelligent Systems, December, Singapore, pp , [22] The KDD CUP 1999 Data., available at: (accessed June 2012),1999. [23] L. Curt, P. Samson, D. Wooten, G. Geniesse, M. Lucas, G. Fancourt, J. Gerstenberger, N. Euliano, G. Lynn, M. Allen and D. Marossero, NeuroSolutions, version Copyright. NeuroDimension Inc., , available at: (accessed June 2012), [24] I. Witten, E. Frank and M. Hall, Data Mining: Practical Machine Learning Tools and Techniques, Elsevier Inc, ISBN , [25] The DARPA Intrusion Detection Data Sets, available at: ideval/data/index.html (accessed June 2012),2012. [26] R. Beghdad, Critical study of neural networks in detecting intrusions, Computers & Security, Vol. 27, No. 5 6, pp ,2008. [27] G. H. John and P. Langley, "Estimating Continuous Distributions in Bayesian Classifiers", In:Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, , [28] R. Quinlan, "C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers", San Mateo, CA, [29] C. Chang and C. Lin, LIBSVM - A Library for Support Vector Machines. URL: [30] W. W. Cohen, "Fast Effective Rule Induction", In: Twelfth International Conference on Machine Learning, , [31] C. M. Bishop, Neural Networks for Pattern Recognition, OXFORD university press, [32] T. Kohonen, Self-Organizing Maps. Third, extended edition, Springer, [33] R. Williams and D. Zipser D, "A learning algorithm for continually running fully recurrent neural networks." Neural Computation 1, , [34] Java SE Development Kit 7, available at: tion/index.html, (accessed June 2012), [35] MySQL Dataset, available at: (accessed June 2012), IT4OD 2014

88 Topic 3: Pervasive systems and multi-criteria decision 82 IT4OD 2014

89 Cloud based Decision Support System for urban planning Imene Benatia Mathematics and computer sciences department Tebessa University Tebessa,Algeria Mohamed Ridda Laouar Mathematics and computer sciences department Tebessa University Tebessa, Algeria Hakim Bendjenna Mathematics and computer sciences department Tebessa University Tebessa, Algeria Abstract In this paper we propose a Cloud based Decision Support System (DSS) for urban planning in order to solve the problem of evaluation and choice of the best urban project and the problem of the conflict between the decision makers. The suggested decisional system is built on the Cloud Computing architecture with three layers and includes the multiple criteria decision making (MCDM) method Promethee II as well as the procedure of negotiation Hare in order to provide the final choice to the decision makers. Keywords Cloud Computing, Decision Support System (DSS), Urban Project. I. INTRODUCTION Nowadays Cloud Computing has become the universal reference in IT management. Cloud computing has been defined b y National Institute of Standards and T echnology (NIST) as a m odel for enabling convenient, on-demand network access to a shared poo l of configurable computing resources (e.g., networks, servers, stor age, applications, and services) that can be r apidly provisioned and released with minimal management effort or cloud provider interaction [11]. Currently, Cloud Computing is us ed to solve p roblems in different domains of Informati on Technology (IT) namely Decision Sup port Systems (DSS), Geographical Information Systems (GIS), Web Application Development, Mobile Technology, E-governance Systems, Scientific Research, Enterprise Resource Planning (ER P)... [2]. Each urban city tries to solve the problems in the various fields (Habitat, Transport, Economy ) by proposing urban projects to reach a sustainable development. The Projects represent a s et of approaches to obtain an agr eement between t he different actors (decision makers). The urban projects are often characterized by contradictory criteria [13] which generates a p roblem of conflict between the decision makers where the latter cause a debate which can affect the final decision. For that, in our study we are interested in proposing a Cloud based Decision Support System (DSS) for urban planning in order to help the decision makers to choose the best urban projects and to avoid the conflict between them. Our contri bution consists in implementing a decision support sy stem deployed on a platfor m Cloud (Clou dify) and managed by the OpenStack infrastructure. This article i s structured as follows: a brief introdu ction is presented in Section 1. Section 2 is devoted to the presentation of the problematic and the description of our contributi on. We will present thereafter, related work in Secti on 3. The suggested approach is described in Section 4 and Section 5 is devoted to the implementation of our Cloud based DSS. Finally the article ends in with a conclusion. II. PROBLEM AND CONTRIBUTION The urban cities undergo man y problems in various fields such as: transport, habitat, trade etc. For this purpose each city tries to solve them by proposing various urba n projects. The posed problem consists in the fact that the urban projects are multi-criteria in nature [13] this means that these projects have several criteria that are often conflicting in nature whose i mportance is n ot the sam e; this problematic involves several entities (ecologist, sociologist, economist...etc.) with contradictory interests, thus the project is seen differently by each one of t hese entities, according to t heir knowledge, objectives and concerns what gen erates a problem of conflict between these actors during the evaluation and the choice of the best urban project; ther efore the decision-making process i s distributed between several entities with the conflict interests what affects the final decision. Our contribution consists in proposing an approach which mak es it possible to resolve the prob lems of evaluation of the urban p rojects and that of the conflict between the decision makers. This approach is based on the co mbination of a Decision Support System (DSS) and the new technology Cloud Computing, in order to ben efit from its advantages such as: cost reduction, facilitated the communication and the accessibility without temporal or geographical constraints, reduction in the deployment and processing time. The proposed decision model allows: 83 IT4OD 2014

90 to specify the subjective parameters of the urban projec ts for each decision maker. to perform a multi-criteria analysis for e ach decision maker by using the Promethee II metho d [3] which makes it possible to arrange the actions (urban projects) from the best to the worst one. execute a negotiation process by employing the Hare procedure [1] which allows to elect between multiple rankings made by the multicriteria analysis and provide the final decision to the whole o f the decision makers. III. RELATED WORK In the context of the urban planning, several researchers concentrated on the resolution of the problems of planning in various domains of the city (habitat, transport, ecology, evaluation etc.): Yang and Chen [23] proposed to integrate a decision support sy stem using a geogr aphical information s ystem(gis) in th e planning of public transport to decrease urban traffic in China. Jun and Yikui [6] developed a decision support s ystem based on artificial intelligence which will m ake the p lanning of road urban n etwork to he lp the different users to move in the city. The rapid growth of urban areas of the city will generate big open spaces, for thi s purpose Maktav, Jurgens, Siegmund, Sunar, Esbah, Kalkan, U ysa, M ercan, Akar, Thunig and Wolf in [10] adopted a m ulti-criteria spatial decision support system which detects and evaluates open spaces in the city; these open spaces will be used for urban planni ng. Shi and Li [18] describe the development of the framework for an integrated decision support system for urban traffic-related air pollution assessment and control founded on expert system. Fayech, Maouche, Hammadi and Borne [14] presented a multi-agent decision support s ystem for the planning of the urban transport network. This system is responsible for detection and analyzes possible disturbances in order to palliate them. Miah and Ahamed [12] proposed a decision support s ystem deployed in a public Cloud which makes it possible the author ities of the Australian security to obtain on demand the information about the behavior of the drivers on the Australian roads. Yujing, Flax, Hui and Zhihua [24] constructed a decisi on support system model with gray relational Decision-Making and Fuzzy AHP which helps government decision makers to evaluate the performance of urban projects and to avoid blind decision. Alshawi a nd Dawood [5] developed a decision support system which allows deci sion makers to ev aluate the construction projects of new cities to solve the problem of slums in the Islamic world. These studies showed that DSS can b e an effective tool for the urban planning, using a wide r ange of DSS tools such as artificial intelligence, multi-agent systems, multiple criteria decision making (MCDM) techniques, geographical information systems (GIS), etc. In order to reduce the deployment and processing time, ameliorate the communication and the coop eration between the decision mak ers, facilitate the accessibility and decrease the cost, we propo se a Decision Support S ystem (DSS) built on the Cloud Com puting architecture which can improve effectiveness of urban project evaluation decision. IV. PROPOSED APPROACH FOR A CLOUD BASED DECISION SUPPORT SYSTEM The proposed approach model is composed of three phases, the first consists in perform ing a m ulti-criteria analysis to elaborat e the rankings of the urban projects u sing the Promethee II method, the second executes the Hare metho d of negotiation process in order to determinate the final choi ce (urban projects) to decision makers and the third phase allows to inte grate our Decis ion Support Sy stem (DSS) which is composed of the Promethee II and Hare method in a private Cloud with three layers. Phase 1: Multi-criteria analysis Multi-criteria decision aid is employed in various domains in order to help the decision maker to choose the optimal solution among a set of solutions. Thus according to Vincke [22] the m ulti-criteria decision aid aims, as its name i ndicates it, to p rovide to a decision maker the tools enabling him to progress in t he resolution of the problem of decision where s everal points of view of, o ften contradictory, must be consid ered. Many real problems can be formulated by defining a set of criteria which e valuate actions performances. When modeling a real world decision problem using multiple criteria decision aid, different problematic can be considered [16]. This distinction differs in the wa y alternatives are cons idered and in the type of result expected from the analysis. Roy determines four problematic [17]: choice problematic consists in working out a procedure of selection, sorting problematic allows to carr y out a procedure for assigning, ranking problematic consists in arran ging the various actions while going from the best action to the least g ood and description problematic allows describing the actions and their consequences. In our context of study the studied problem can be considered as a ranking problematic wher e there exist several urban pro jects that need to be classified by decision makers from the best to the worst in order to determine the best project to be applied. To solve this kind of problem, there are various methods such as: Promethee I, Promethe II, Electre II, Electre III and Electre IV. For our approach we use the Promethee II as a method for ranking various urban projects from the best to the worst; in this case all the actions are comparable, it is possible to discriminate them. Pr omethee II is a prominent method for multi-criteria decision aid that builds a complete ranking on a set of potential actions by assigning each of th em a s o-called net flow score. However, to calculate these scores, each pair of actions has to be compared [8]. In our cas e, each cr iterion is seen differently by each decision maker thus the decision maker specifies the weight of the criterion according to its p reference, interests, knowledge and objectives. Table 1 shows the va rious urban d evelopment housing projects with their char acteristics where a decis ion maker specifies the weight of each characteristic. Table 1. Urban development housing projects and their characteristics Projects Characteristics of projects Social characteristics Economic characteristics Ecological characteristics C1 C2 C3 C4 C5 C6 Project Project Project Project Project Weight While inspiring by the concept of sustain able development, we classified the characteristics of the proje cts into three categories (social, economical and ecological characteristics) in order to help decision-makers to specify the weight of the criteria. Social characteristics: 84 IT4OD 2014

91 C1: Block number: is the number of buildings associated to a project. C2: Number of apartments: is the total number of houses to be built by project. C3: Apartments surface (m²): is the area of the houses. Economical characteristics: C4: Time of work (month): is the time of completion of the project. C5: Amount (Algerian dinar): is the overall cost of the project. Ecological characteristics: C6: Ecological effect (takes the values 1, 2, 3): - if the project is low ecological, it takes the value 1. -if the project moderately ecological it takes the value 2. -if the project is highly ecological it takes the value 3. We use these characteristics as a de cision criterion because they represent the basic elements of an urb an development housing project. The decision making process by the Pro methee II meth od is composed of four steps that are presented hereafter: [20] Step 1: The first step computes for each pair of projects and for each criterion, the value of the preference degree. Let g j ( pk) be th e value of a crit erion j for a projec t pi. We note ( pi, pk), the difference of value of a criterion j for two projects pi and pk. d ( pi, pk ) g ( pi ) g ( pk ) j j j d j (1) 1 ( pi ) Pj ( pi, x) n 1 x A The computation results of positive and neg ative outranking flow for each project are represented by Fig. 1. Fig. 1. Outranking flows Step 4: The last step consists in using the outranking flows to establish a complete ranking between the projects. The ranking is based on the net outranking flows. These are computed for each project from the positive and negative outranking flows. The net outranking flow ( pi) of a project pi is computed as follows: ( pi ) ( pi ) ( pi ) (6) The computation results of net outranking flow for each p roject are represented by Fig. 2. At this stage, we r ank the pro jects in descending order by using Phi scor es. The ranking of the projects is illustrated in Fig.3. (5) P j ( pi, pk) is the value of t he preference degree of a cri terion j for two projects pi and pk. The preference functions used to compute these preference degrees are defined such as: 0 : if d j ( pi, pk ) 0 (2) 1 : if d j ( pi, pk ) 0 Step 2: The second step consists in aggregating the preference degrees of all criter ia for each pair of pro jects. For each pair of projects, we compute a global preference index. Let C be the set of considered criteria and w j the weight associated to the cr iterion j (specified by the decision maker). The global preference index fo r a pair of project pi and pk is computed as follows: Fig. 2.The net outranking flow ( pi, pk ) w P ( pi, pk ) j C j j (3) Step 3: The third step, which is the firs t that concerns the ranking of the projects, consists in com puting the outranking flows. For each project pi, we compute the positive outranking flow ( pi) and the negative outranking flow ( pi).let A be the set of projects and n the number of projects. The positive outranking flow of a project pi is computed by the following formula: 1 ( pi ) Pj ( pi, x) n 1 x A The negative outranking flow of a project following formula: (4) pi is c omputed by the Fig.3. Ranking of projects by Promethee II 85 IT4OD 2014

92 In our context ther e are s everal decision makers where each decision maker express his personal point of view by specifying the subjective parameters (i.e. fix the weight of the criteria for each urban project), for th at we use Promet hee II for each decision maker ; therefore we will obtain several rankings where each ranking corresponds to a decision maker as shown in Fig. 4; in this case there will be a problem of conflict between the decision makers to choose the best project among a set of rankings. To solve this problem we propose to use a negotiation process. Figure 5 illustrates the functioning of our proposed DSS which is composed of two steps: t he Multi-criteria analysis and the negotiation process in order To give the final choice (the best urban project) to the decision-makers. Fig. 4. Multi-criteria analysis using Promethee II Phase 2: Negotiation process Negotiation is a process by which several parts lead to a common decision. Some researchers used the negotiation to solve problems of the urban planning: Longfei and Hong [9] proposed an approach for resolving the problem of parking in the city using negotiation process based on calculation of routes utility. Takahashi, kanamori and Ito [19] described a route provid ing method to acq uire efficient traffic flow based on anticipatory stigmergy for estimating vehicle position and negotiation system for changing route assignment. Di Lecce and Amato [7] proposed a solution for the problem of hazardous goods by creating a new planning route based on m ulti agent coop erative negotiation paradigm. The negotiation is grouped int o three f amilies: the first f amily is that of the voting systems which are used to choose a solution among several. The second one the au ction which allows people to sell their goods with the best price an d the th ird family is th at of the negotiation by argumentation which makes it possible the agen ts to convince the others by the choice of good arguments. [21] In order to r esolve the problem of conflict between decis ion makers we use the voting system to elect an urban project among a set of rankings of urban projec ts carried out b y the m ulti-criteria analysis. There are different methods of vote such as: Condorcet method, Borda method and Hare method. For our Decision Support System (DSS) we emplo yed the Hare method; this method determines the social cho ice by eliminating successively the alternatives (urban projects) the least desired in order to elect the best urban project. We choose to u se the Hare m ethod because this one does not present an y disadvantage and we are sure to find the social choice whereas the Borda method makes it possible to give the social choice but has the disadvantage of encouraging the ta ctical or reasoned votes and the Condorcet method has the disadvantage of not finding the social choice. [21] Fig. 5. Proposed DSS for evaluation of urban projects We suppose that we have five decision makers, th e results of application of multi-criteria analysis for ea ch decision m aker are represented below in Table 2: Table 2. Ranking of projects according to each decision-maker Decision maker 1 Decision maker 2 Decision maker 3 Decision maker 4 Decision maker 5 Project1 Project1 Project5 Project3 Project1 Project3 Project3 Project3 Project1 Project3 Project5 Project4 Project1 Project4 Project5 Project4 Project2 Project2 Project2 Project4 Project2 Project5 Project4 Project5 Project2 Hare method u ses this results t o determinate th e best project. The best urban project given by Hare method is project 1. Phase 3: integration of DSS in private Cloud In this section we will present the architecture of the private Cloud that we hav e suggested to in tegrate our proposed Decision Support System (DSS) for ev aluation of urban projects. Suggested architecture is illustrated by Fig. 6. We chose to integrate our DSS in a private cloud because th e proposed DSS is not intended for the general public; it is used b y a group of d ecision makers. B y deploying our DSS in private cloud th e data will be more pro tected and secure. 86 IT4OD 2014

93 Fig. 6. Integration of DSS in private Cloud Our architecture Cloud is b ased on thr ee essential layers: an infrastructure layer, a platfor m layer and a user layer. Infrastructure layer: The infrastructure layer enables to place at the disposal of the virtual resources by using standard equipment namely: units equipped with a great po wer of calcu lation, processing and storage. At this level we use IaaS (Infrastructure as a service) Cloud for providing a virtualized, distr ibuted and automated infrastructure base. The basic function of I aas Cloud is the creation of vir tual machines. In th e context of our stud y, the city contains several domains such as transport, housing development, commerce; networks etc. In each dom ain, the city offers many urban projects that must be study and evaluate by different decision makers; for this we use the Ia as Cloud to cre ate a virtual server for each field of the city. Platform lay er: The d evelopment platform rests on the infrastructure layer. It is the central layer of the architecture; it uses the resources provided by the IaaS layer and it offers all the elements and tools necessary to support the construction, the deployment and the life cycle of the appli cations. At this leve l we deploy a PaaS (Platform as a servic e) Cloud in each vir tual server created by the IaaS Clould.We emplo y this PaaS Cloud to benefit from its offered tools in ord er to create a v irtual machine in which we d eploy our proposed Decision Support System (DSS). User layer: In this layer the end-users (decision makers) access to our Decision Support System (DSS) through a web browser in order to participate in evaluation and choice process of th e best ur ban project. V. IMPLEMENTATION OF CLOUD BASED DECISION SUPPORT SYSTEM FOR THE EVALUATION OF URBAN PROJECTS This section discusse s the implementation of our propo sed architecture to integrate our s uggested Decision System in private Cloud. The scenario of implementation is illustrated in Fig. 7. Fig. 7. Cloud based DSS for evaluation of urban projects A. Layer 1: infrastructure layer In our study we chose the OpenStack Cloud [15] because it showed his proofs beside professionals of the domain, it represents a robust system allowing the creation of private Cloud and offer s a development platform (i.e. it allows to deplo y a PaaS Clou d to develop applications). OpenStack makes it possible to implement a virtual system of waiter and storage. Using its components (Nova, Swift, Glance) we can create virtual machines where e ach one of them corresponds to a domain of the city (transport, habitat, trade ). B. Layer 2: platform layer For our architecture we chose Cloudify [4] because it r epresents a platform which can be deployed on OpenStack and makes it possible to integrate any middleware (Java, PHP,.NET, Ruby or others). This open source solution allows to manufacture a custom PaaS and to carry out a set of tasks relating to the deployment of an application on a Cloud and to the managing of its life cycle. Within each v irtual machine creates by OpenStack we deplo y Cloudify, this last will create a virtual machine in which it deploys our application which contains three s ervices (Tomcat, Decision Support S ystem «DSS» and Database). For each connection of a d ecision maker, Cloudify will allocate to him an instance of each service of the application. - Tomcat service: It represents the interface Web which will make the interaction between the decisio n makers and t he Decision Support System interface. - Database service: The database makes it possible to the chiefs of the projects to store all the proposed urban projects with th eir characteristics to solve a probl em in a dom ain of the c ity. The decision support system communicates with this database in order to collect information on projects. - DSS service: it s the Decision Support S ystem (DSS) which was presented previously in section (4) which contains Promethee II and Hare method. The DSS interacts w ith the database to collect all information concerning the urb an projects in order to carry out a multi-criteria analysis which makes it possible to arrange the proposed urban projects. Thereafter a n egotiation process will be launched to determine the final choice to the decision makers via the Tomcat interface. C. Layer 3: user layer 87 IT4OD 2014

94 In order to s olve a problem in a f ield of the city, each decision maker can access the appli cation to part icipate in th e decisionmaking process via an y device (computer, Smartphone, tablet...) equipped with a web browser and an internet connection. VI. CONCLUSION In this article we presented our suggested approach which makes it possible to help the decision makers of an urban city to choose the best urban project in order to solve a problem in a f ield of the city. Our approach is composed of three steps, in the first step we used the Promethee II method to arrange th e urban projects of diff erent decision makers; in the second phase we employed the Hare method which uses the results provided by the Promethee II method in o rder to elect the best urban project. I n the third step to improve our DSS which composed of Promethee II and Hare method, we have integrated it in a private cloud with three layers. At the end of this article, we realized a case study on urban housing development project in order to validate our proposed Decision Support System. REFERENCES [1] A.D Taylor and A.M Pacelli. Mathematics and politics Strategy, Voting, Power and Proof. 2nd Ed. Ch 1, pp [2] M. Bhat, R.Shah, B. Ahmad and I. Bhat. Cloud Co mputing: A Solution to I nformation Support Systems (ISS). International Journal of Computer Applications 11(5): pp [3] J.Brans, P.Vincke. A Pr eference Ranking Organization Method. Management Science, 31(6): pp [4] Bringing DevOps Automation to the World of PaaS. Availabl e at [5] I. Dawood and M.Alshawi. Decision Support Systems (DSS) Model for the Housing Industry. Second International Conference on Developments in esystems Engineering (DESE) IEEE. [6] D. Jun and M.Yikui. Intelligent Decision Support Sy stem for Road Network Planning I SECS International Colloquium on Co mputing, Communication, Control, and Management IEEE. [7] V. Di Lecce and A. A mato. Route planning and user interf ace for an advanced intelligent transport sy stem. IET Intelligent Transport Systems.2011.IEEE. [13] I.C. Schutte and A. Brits. Prioritising transport infrastructure projects : towards a multi-criterion analysis. Southern African Business Review. 16(3): pp [14] B. Fayech, S. Maouche, S. Hammad and P. Borne. Multi-agent decision-support system for an ur ban transportation network. Automation Congress, 2002 Proceedings of the 5th Biannual World IEEE. [15] Open source software for building private and public clouds. Available at [16] B. R oy. Multicriteria Methodology for Decision Aidin g. Kluwer Academic, Dordrecht (1996). [17] B. Roy. Méthodologie multicritère d aide à la décision. Ed. Economica [18] Y. Shi and J. Li. Improving the Decisional Co ntext: New I ntegrated Decision Support System for Urban Traffic-related Environment Assessment and Control. International Conference on Mechanic Automation and Control Engineering (MACE) IEEE. [19] J. Takahashi, R. kanamori and T. Ito. Evaluation of Autom ated Negotiation System for Changing Route Assignment to Acquire Ef ficient Traffic Flow. The IEEE 6th Inte rnational Conference on Service-Oriented Computing and Applications (SOCA) IEEE. [20] P.Taillandier and S.Stinckwich. Using the PROMETHEE multi-criteria decision making method to define new exploration strategies for rescue robots. IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Kyoto : Japan [21] M.H. Verrons. GeNCA : un modèle général de négociation de c ontrat entre agents. Thèse de Doctorat de l université des sciences et technologie et de Lille.2004 [22] P. Vincke. L'aide multicritère à la décision. 1e r Ed, Ed de l'universit é de Bruxelles, Belgique [23] X.M. Yan g an d X.W. Chen. Integrated decision support system designed for Chinese city public transportation planning and operation. The IEEE 5th Internat ional Conference on intelligent Transportation System s IEEE. [24] W. Yujing, L. Lin, Z. Hui a nd L. Zhihua. Decision Model of Urban Infrastructure Projects Based on Co mprehensive Evaluation. 2nd I EEE International Conference on E mergency Management and M anagement Sciences (ICEMMS) IEEE. [8] S. Eppe, Y. De Smet. Approximating Promethee II's net flow scor es by piecewise linear value functions. CoDE-SMG Technical Report Series.2012.Available at [9] W.Longfei and C. Hong. Coorporative Parking Negotiation and Guidance Based on Intelligent Agents. Inte rnational Conference on Co mputational Intelligence and Natural Computing IEEE. [10] D. Maktav, C. Jurgens, A. Siegmund, F. Sunar, H. Esbah, K. Kalkan, C. Uysa, O.Y. Mercan, I. Ak ar, H. Thunig and N. W olf. Multi-criteria Spatial Decision Support Sy stem for Valuation o f Open Spaces for Urban Planning. 5th International Conf erence on Recent Advances in Space Technologies (RAST) IEEE. [11] P. Mell and T. Grance, The NIST definition of Cloud computing (draft). NIST special publication, 800(145): pp [12] S.J. Miah and R. Ahamed. A Cloud-Based DSS Model for Driver Safety and Monitoring on Australian Roads. International Journal of Emerging Sciences. 1(4): pp IT4OD 2014

95 Steganalysis based on Zipf s law Laimeche Lakhdar Tebessa University/Computer Sciences Department, Tebessa, Algeria Abstract In this paper, we propose a novel steganalytic method to effectively attack the advanced steganographic methods. Our method presents a double originality. On the one hand, it is based on the application of Zipf s law using wavelet decomposition. On the other hand the extraction of a new feature vector based on auto similarity between wavelet sub-bands and dependencies between wavelet sub-bands coefficients. Since the hidden message is sometimes independent to cover images, the embedding process in an image often decreases the dependencies exiting in cover images to some extent. Therefore, we have exploited the alteration of Zipf curves of each sub-band to extract a new set of statistics which can differentiate between cover and stego images. Keywords Steganalysis; Steganography; Zipf s law; Zipf Curve; discrete wavelet transform. I. INTRODUCTION Steganography refers to the science of invisible communications, unlike cryptography, the goal is to secure communications from an eavesdropper and strive to hide the very presence of the message itself from an observer. Several steganographic techniques for images have been developed and can divided into two types of methods: spatial embedding techniques in which secret messages are embedded into the least significant bits of pseudo-randomly chosen pixels or palette colors (e.g.[1]-[2]) and the transform embedding techniques in which secret messages are embedded by modifying frequency coefficients of the cover image (e.g.[3]-[4] ). The counterpart of steganography called steganalysis, its aims is the detection of the presence of secret messages and even to extract them. The current steganalytic methods are classifies into two classes [5]: specific steganalysis (e.g., [6] [7]) and universal steganalysis (e.g., [8] [9]). Universal steganalysis attempts to detect the presence of an embedded message independently of the embedding algorithm and of the image format. While specific steganalysis approaches take advantage of particular algorithmic details of the embedding algorithm. Many steganalytic methods have been proposed in literature [10-19]. All most all methods are based on that, the steganographic methods leaves statistical artifacts on an image, which could be used to distinguish between stego and cover images. For example, in [12], authors assume that steganography affects the histograms of the images, which they measure via the center of gravity of the characteristic function of the RGB probability density functions (pdf). Farid in [11] assumes that correlation across wavelets bands is Farida Hayet Merouani and Mazouzi Smaine Badji Mokhtar University/Cmputer sciences Department, LRI Laboratory, Annaba, Algeria Skikda University/Cmputer sciences Department, Skikda, Algeria {Hayet_merouani, affected, while Avcibas et al. demonstrate that image quality metrics are perturbed [10]. In [14], shows that nth moments of wavelet of characteristic function are related to the nth magnitude of the derivative of the corresponding wavelet histogram and hence is sensitive to data embedding. Fridrich et al. assume that histogram of DCT coefficients are modified [19]. In this paper, in order to capture these statistical artifacts and hence to determine the presence of hidden messages, we propose a s et of statistics extracted from wavelet sub-bands and based on Zipf law. The basic idea is that, the general shape of Zipf curve varies in an important way according to the content of wavelet sub-band and any steganographic manipulation it will alter this latter. The rest of this paper is organized as follows. Section 2 presents an introduction of Zipf s law and the existing methods which used this later in image. A review of Farid s steganalytic method is given in section 3. The proposed method is detailed in Section 4. The Experimental results of the proposed method are presented in Section 5. Finally, the paper is concluded in Section 6. II. ZIPF S LAW STATEMENT Zipf s law is an empirical law described by G.K Zipf in [20], also known as the principle of least effort. It is deal with the statistical distribution of appearance frequencies of patterns. The example described by Zipf is the frequency of words in the English language. The idea of Zipf s law is based on a set of topologically organized symbols; the n-tuples of symbols do n ot follow a random distribution. A power law appears when the frequencies are ordered in a d ecreasing order denoted. This can be described by the following expression: Where is the frequency of the symbols numbered r, is the rank of the symbols whose sorted in the decreasing order and C is a constant. The parameter α characterize the power law. If the frequency over rank graphic plotted in a double-logarithmic scale graph, the result is a straight line. Several methods have been used Zipf s law as an analysis tool for images. The majority of these methods consists in studying directly spatial domain like Artificial object detection in (1) 89 IT4OD 2014

96 natural environments [21],Compressed Image Quality and Zipf s Law[22] and power law models in detecting region of interest[23]. The elements used to verify Zipf s law are small subsquares of an image. Each pattern is a 3x3 matrix of adjacent pixels coded using the general rank method [24]. In this method, the grayscale levels of the pattern are ordered in the increasing order and a number is affected to each pixel according to its rank in the local grayscale order as shown in figure (a) (b) Fig. 1. Original pattern (a), coded by general rank method (b) To plot the Zipf curve of an image which is based on patterns distribution, the image is scanned with a 3x3 mask and each pattern is stored in a two-dimensional dynamic array containing, for each pattern, the rank of pattern and its number of occurrences in the image. If a new pattern is not in the array, it will be added at the end, else the number of occurrences of the current pattern is increased by 1. When the pattern counting is completed, the array is sorted in the decreasing order. If the frequency over rank is plotted in a double-logarithmic scale diagram, the result is a Zipf curve as shown in figure 2. ranks is non-linear, contrary to the second one which is almost linear. In this method, analyzing an acoustic signal consists to study the evolution of curve s slope of the two parts of Zipf curves. III. REVIEW OF FARID S WORK The wavelet transform is a multi-scale transformation of image data that can be viewed as giving spatial location of frequency content. Many natural-scene image data display strong correlations between the scales in a multi-scale decomposition. Farid in [11] exploited the nature of these statistical regularities when he used three different SVMs (linear, non linear and one-class with six hyperspheres) to detect steganographic content in images using three different embedding Outguess, F5 and Jsteg algorithms. He started to decompose an image with separable quadrature mirror filters. This is accomplished by applying separable low-pass and high-pass filters along the image axes and generating a vertical, horizontal, diagonal and low-pass sub-band. For each subband, higher-order moments, including mean, variance, skewness and kurtosis, were computed at each of the first 3 levels on each subband image. Note that at each level, there are 3 sub-band images, so we obtain = 36 feature values. A second set of feature based on the errors in a linear predictor of coefficient magnitude is computed. For example, the coefficient of the vertical subband V i (x, y) can be predicted in by: (2) Fig. 2. Zipf curve of image Cameraman To date, the only application have been proposed in frequency domain is for analyzing an acoustic signal based on Zipf s law [25, 26]. This method consists to extract a set of parameters based on Zipf s law to characterize and detect signal regions of interest. The audio signals are computed in continuous wavelet transform, using k different values of the scale factor. Each position is associated with a set of k coefficients, called profile and defined as follow: = { When these profiles are computed, a model coding is proposed, which consists in converting these profiles into word to obtain a text. The text obtained is divided into subtexts in order to detect and quantify signal regions of interest. These sub-texts are then analyzed by the Zipf s law. Authors are note that, the Zipf s curves of two regions of a signal can be split into two parts. The first one corresponding to first Where the w i s are the unknown scalar weighting values, i denotes the scale or level value, and (x, y) denotes the pixel location in the image sub-band. For each subband type (vertical, diagonal, and horizontal), the expression can be written in matrix form. For example, for the vertical subband, we write (3) and then the unknown variables w are solved for, using the least mean square solution Finally, the log error in this predictor is given by Thus, four moments of the log error of three high frequencies Sub-band coefficients at each scale were collected as total 12 features. For 3-level decomposition, we can obtain 36 statistic moments of the log error, yielding a total of 72 statistics. Results of experiment showed that these features have a good performance at discriminating cover and stego images. (4) (5) 90 IT4OD 2014

97 IV. PROPOSED METHOD Zipf s law is used to modeling the distribution of patterns in an image [27]. In this section, we propose an efficient universal steganalytic method to distinguish between stego and cover images using wavelet decomposition and based on Zipf s law. A. Model Encoding of Wavelet Sub-Bands In our model encoding we have used the general rank method [24] to coding the wavelet sub-bands coefficients. This method consists, to number the wavelet coefficients inside block N N according to the children coefficients classified in the order ascending of their value by affecting the same rank when the wavelet coefficients are the same values. We assign the value 0 to the lowest wavelet coefficient, and we increment this value of a unit whatever the relative variation between two consecutive coefficients. We have used the general rank method for two reasons: 1) The wavelet coefficients in scale n are not organized in a random way, where they depend to their parents in scale n-1, 2) The choice of the size and the nature of patterns (square or linear) depends to the parent children relations across sub-bands [28, 29], 3) and the general rank method interest to the weak variations of wavelets coefficients. Finally, to plot the Zipf curves of each sub-band, patterns are stored in a two-dimensional dynamic array containing, for each pattern, the code of pattern and its number of occurrence in sub-band. Then, sub-band scanned with an n x n m ask, n depend to parent children relations across sub-bands, and each pattern is compared with patterns already contained in the array. If a new pattern is not in the array, it will be added at the end, else the number of occurrences of the current pattern is increased by 1. When the pattern counting is completed, the array is sorted in the decreasing order. B. Analyzing Zipf Curves Changes of sub-bands Analyzing a wavelet sub-band using Zipf s Law is defined as follows: a) Coding the patterns using the general rank method and assigning to each pattern distinct its frequency of appearance in the sub-band. b) Classified these patterns in the decreasing order of their frequency of appearance. c) Representation in a log-log plot of frequency vs. rank, this later is called the Zipf curve. The result is a set of points aligned on a line whose slope is equal to α according to the formula (1). For a wavelet decomposition with scales S = 1 n, we obtain n-1 Zipf curves. First of all, we have starting our analysis by studying the Zipf curves changes of wavelet sub-bands of cover image. The image decomposition employed here is Haar wavelet. Three different examples of images presented in figure 3.The first image contain smooth areas (image plane), second image contain details like edges (image Baboon) and the last image (image Lena) combine between smooth and edges areas. The Zipf curves of each sub-band in second scale associated to these three images can be seen in figure 4. Image plane (a) image Baboon (b) image Lena(c) Fig. 3.Tested images (a) (b) (c) Fig.4.Zipf curves of sub-bands (Horizontal, Vertical and Diagonal) in second scale of images: Plane (a), Baboon (b) and Lena (c) From figure 4, it can be seen that the Zipf curve of wavelet sub-band is construct of two parts; the first curve (linear part) presents the patterns which have the same frequencies of distinct patterns, these patterns correspondent to the lowest wavelet coefficients in each sub-band. However, patterns which contribute to the second curve (nonlinear part) correspondent to the highest wavelet coefficients in each subband (details presents in image). 91 IT4OD 2014

98 In second step of our analysis of Zipf curves of wavelet sub-bands (H, V, D), we assume that any steganographic manipulation on a wavelet sub-band will be alter the statistical patterns distribution. With an embedding rate 0.1, a s ecret message was embedded into the cover image Lena, using Jsteg [30] (Jsteg is freely available algorithm, it takes as input an image in a bitmapped format and returns a stego image in jpeg format). Figure 5 show the Zipf curves of each wavelet sub-band of cover and stego image Lena which can be interpreted by different repartition of patterns in sub-bands especially in their nonlinear curve of the Zipf curves which corresponds to the highest wavelet. Also, it can be seen that, the LSB steganography modify various statistical characteristics of Zipf s law like the most frequency of pattern present in subband, the total number of distinct patterns, the rank of each distinct pattern and the slope α of Zipf curve which leads to the modification of Zipf curve as shown in figure 5. Our feature vector is composed, firstly, of four features based on Zipf s law which they are defined as follows: The Ordinate at origin. The ordinate at origin of Zipf curve design the highest frequency of pattern in sub-band. For example the ordinate at origin of Zipf curves associated to vertical, horizontal and diagonal sub-bands, figure 7, of cover and stego image Lena are shown in table I. TABLE I. VALUES OF THE ORDINATE AT ORIGIN OF ZIPF CURVE OF EACH SUB-BAND IN BOTH COVER AND STEGO IMAGES Vertical sub-band Horizontal sub-band Diagonal sub-band Cover image Stego image It can be seen in table 1 that the ordinates at origin of Zipf curves associated to sub-bands of stego image have increased compared to those of cover image. Entropy: The entropy permits us to measure the uniformity of Zipf curve distribution. Let the entropy defined by Caron and Al [23] as follow: (6) (a) (b) Fig.5. Zipf curves of sub-bands (V, H and D) of cover (a) and stego image (b) Lena C. Feature Extraction Application of Zipf s law allow us to extract a set of statistics from Zipf curves like: the total number of patterns present in sub-bands, the distinct number of patterns, the rank of each pattern, the slope of the straight line, the ordinate at origin of Zipf curve, the area under Zipf curve and the entropy. In this formula, I (ƒ) represent the number of distinct patterns having an equal frequency of appearance to ƒ, F present the total number of patterns frequencies in sub-band and R is the distinct patterns presents in sub-band. The values of entropy are between 0 and 1, it be maximal when all patterns have the same frequencies, it be minimal when the relative frequencies of different patterns equal to 1.This definition of entropy gives more important weight to the highest wavelet coefficients (distinct patterns). Table II, shows the values of the entropy after the embedding operation. For stego image the entropy values are decreased which can be interpreted by the existence of a lot of patterns which have a different frequencies in sub-bands. TABLE II. VALUES OF THE ENTROPY OF EACH SUB-BAND IN BOTH COVER AND STEGO IMAGES Vertical sub-band Horizontal sub-band Diagonal sub-band Cover image Stego image The slope P: The third feature is a measure of the curve slope; it i s computed by least-square regression method according to the formula (3). (7) 92 IT4OD 2014

99 This value depends generally of the quality alignment of the Zipf curve, i.e., adequacy of the Zipf s law to the sub-band. Table 3, shows the values of the slope P of Zipf curves in each sub-band of cover and stego image. TABLE III. VALUES OF THE SLOPE P OF ZIPF CURVE OF EACH SUB- BAND IN BOTH COVER AND STEGO IMAGES Vertical sub-band Horizontal sub-band Diagonal sub-band Cover image Stego image We can notice that, the slope P values obtained from each subband of cover and stego image are significantly different. This can be interpreted by the different repartition of pattern distributions of sub-bands. The area under Zipf curve: Another feature that can be used in the proposed method is the area under Zipf curve (AUZC). It is computed with formula defined in [22] as follow: (8) Where, is the logarithm of pattern frequency i and r is the logarithm of the rank of each pattern. TABLE IV VALUES OF THE AREA UNDER ZIPF CURVE OF EACH SUB-BAND IN BOTH COVER AND STEGO IMAGES Vertical subband Horizontal sub-band Diagonal sub-band Cover image Stego image observed that the area under Zipf curve increase when there is a hidden message in an image as shown in table 4. The remaining four features consist of the mean, variance, skewness and kurtosis. Finally, our proposed feature vector is computed at each 3 levels. Note that at each level, there are 3 sub-band images, so we obtain = 72 feature values. V. EXPERIMENTAL RESULTS In order to evaluate the performance of the proposed steganalytic method, four experiments are performed here for the cases of 2LSB, Outguess and F5 embedding. We have used 1400 i mages, downloaded from UCID database at originally very high resolution color image, we reduced them in size to approximately pixels and converted them to grayscale ones. For obtaining stego images, with different rates p {0.01, 0.05, 0.1, 0.15}, secret messages have been embedded into images using, separately, 2LSB, Outguess and F5 embedding, resulting three groups of stego images. The 2 LSB technique operates in the spatial domain, but the second set of techniques which include OutGuess [3] and F5 [31] operates in the JPEG domain, therefore we have compressed the image dataset to JPEG format with quality factors ranging from 70 to 90. In the classification process, we have tested separately each group of stego images; therefore, we randomly selected half of original images and the corresponding half of stegoimages for training and the remaining half pairs of the cover images and stego-images. For comparison reason, we have also implemented the steganalysis scheme proposed by Farid [11]. Then we apply it to the same set of images and the same steganographic methods. The same training and testing procedures are used. All the results are listed in table 5. From Zipf curves of sub-bands of cover and stego image, we have computed the area under Zipf curve (AUZC), we have TABLE V. PERFORMANCE COMPARISON USING SVM CLASSIFICATION Embedding techniques Outguess F5 2LSB Farad s method Proposed method Embedding rate False False False False Accuracy p positive negative positive negative Accuracy / / % 292/ /700 56,78% / / % 269/ / % / / % 203/ / % / / % 313/ / % / / % 273/ / % / / % 241/ / % / / % 217/ / % / / % 169/ / % / / % 117/ / % / / % 42/700 69/ % 93 IT4OD 2014

100 We can learn from table 5 that the proposed method proves superior for 2LSB embedding while Farid s method proves superior in the case of the F5 and Outguess techniques. VI. CONCLUSION We have presented a novel steganalytic method to effectively attack the advanced steganographic methods. A new feature vector based on Zipf law has been extracted from sub-bands of discrete wavelet transform to capture the changes between sub-bands in the same spatial orientation (self-similarity). The experimental results have demonstrated that the proposed method is superior for 2LSB embedding while Farid s method proves superior in the case of the F5 and Outguess techniques. REFERENCES [1] J.J. Hansmann, Steganos. Deus Ex Machina Communications. [2] Brown, S-Tools for Windows, Shareware security/crypt/code/ (version 3), code/ (version 4) [3] N. Provos, OutGuess Universal steganography, " [4] Latham: Steganography: JPHIDE and JPSEEK, 1999, [5] A. Nissar, and A.H. Mir, Classification of steganalysis techniques : A study, in Digital signal processing, 2010, [6] N.F. Johnson, and S. Katzenbeisser, A survey of steganographic techniques, in: S. Katzenbeisser, F. Petitcolas (Eds.), Information Hiding, Artech House, Norwood, MA, 2000, pp [7] J. Fridrich, and M. Goljan, Practical steganalysis of digital imagesstate of the art, in: Proc. SPIE Photonics West, Electronic Imaging (2002), Security and Watermarking of Multimedia Contents, San Jose, CA, vol. 4675, January 2002, pp [8] T. Al Hawi, M. Al Qutayari, and H. Barada, Steganalysis attacks on stego images use stego-signatures and statistical image properties, in: TENCON 2004, Region 10 Conference, vol. 2, 2004, pp [9] M.U. Celik, G. Sharma, and A.M. Tekalp, Universal image steganalysis using rate-distortion curves, in: Proc. IST/SPIE 16th Annu. Symp. Electronic Imaging Science Technology, San Jose, CA, January 2004, pp [10] Avcibas, N. Memon, and B. Sankur, Steganalysis using image quality metrics, in: Security and Watermarking Multimedia Contents, SPIE, San Jose, CA, [11] H. Farid, Detecting hidden messages using higher-order statistical models, in: Proc. IEEE Int. Conf. Image Process., Rochester, NY, vol. 2, September 2002, pp [12] J.J. Harmsen, and W.A. Pearlman, Steganalysis of additive noise modelable information hiding, in: Proc. SPIE, Electronic Imaging 2003, Security and Watermarking of Multimedia Contents, San Jose, CA, January 2003, pp [13] W. Lie, and G. Lin, A feature based classification technique for blind image steganalysis, IEEE Trans. Multimedia 7 (6) December 2005, [14] G. Xuan, et al., Steganalysis based on multiple features formed by statistical moments of wavelet characteristic functions, in: Lecture Notes in Computer Science, vol. 3727, Springer-Verlag, Berlin, 2005, pp [15] A. Madhuri and Joshi, Digial Image Processing: An Algorithimic Approach, Prentice Hall of India, New Delhi, [16] D. Zou, Q. Yun Shi, W. Su, and G. Xuan, Steganalysis based on Markov model of thresholded prediction-error image, IEEE, ICME, [17] Q. Yun, and Shi, et al., Image steganalysis based on moments of characteristic functions using wavelet decomposition, in: Prediction- Error Image and Neural Network, ICME, 2005, pp [18] X. ochuan Chen, Y. Wang, T. Tan, and L. Guo, Blind image steganalysis based on statistical analysis of empirical matrix, IEEE, ICPR, [19] J. Fridrich, M. Goljan, and D. Hogea, Steganalysis of JPEG images: breaking the F5 algorithm, in Proc. 5th International Workshop on Information Hiding (IH 02), pp , Noordwijkerhout, the Netherlands, October 2002.T. [20] Joachims, Making large-scale SVM learning practical, in: B. Scholkopf, C. Burges, A. Smola (Eds.), Advances in Kernal Methods Support Vector Learning, MIT Press, [21] G.Z. ZIPF, Human behavior and the principle of "Least Effort", Addison-Wesley, New York [22] Y.Caron, P. Makris and N. Vincent, A method for detecting artificial objects in natural environments, RFAI team publication, International Conference on Pattern Recognition ICPR2002, Québec City (Canada), 2002, [23] N. Vincent, P. Makris and J. Brodier, Compressed image quality and Zipf s law, International Conference on Signal Processing (ICSP IFIC-IAPR WCC2000), Beijing (China), 2000, 21-25, [24] Y. Caron, P. Makris and N. Vincent, Use of power law models in detecting region of interest, Pattern Recognition, Elsevier, (vol. 40), Issue 9, , [25] Bi, Asselin and Mraghni: Spatial gray levels distribution based unsupervised texture segmentation. In Proceedings of 3rd International Conference of Signal Processing (ICSP96), Pékin (Chine), [26] E.Dellandrea, P. Makris, M. Boiron, and N. Vincent, A Medical acoustic signal analysis method based on Zipf law, RFAI publication: Accepted at International Conference on Digital Signal Processing, Santorini (Grece), 2002, vol. 2, p [27] E. Dellandrea, P. Makris, and N. Vincent, Zipf analysis of audio signals, Revue Fractals, World Scientific, 2004, Vol. 12, n 1, p [28] P. Makris and N. Vincent, Zipf s law: A tool for image characterization, IAPR Vision Interface (VI 2000), [29] S. Mallat A Wavelet tour of signal processing, ISBN: Academic Press, [30] A. Said and WA. Pearlman, A new fast and efficient image codec based on set partitioning in hierarchical trees. IEEE Transactions on Circuits and Systems for Video Technology, 1996, [31] D Upham. (). Jsteg. Software available at [32] Westfeld. F5 A steganographic algorithm. High capacity despite better steganalysis, in: I.S. Moskowitz (Ed.), Information Hiding, Fourth Int.Workshop, in: Lecture Notes in Computer Science, vol (2137), Springer-Verlag, Berlin, Heidelberg, 2001, IT4OD 2014

101 A Cloud Computing model for Algerian universities Lydia Yataghene USTHB University Kahina Ait-Idir USTHB University El Abbassia Deba University of Oran Abdelkader Eddoud AUF - Algeria Abstract Private cloud is being developed in the university environment. Many institutions are now facing some issues due to the use of IT infrastructure. The project objective is to build a cloud that is able to meet the needs of Algerian universities. This work is twofold : It first proposes a study of various models of cloud computing, and secondly selects and adapts a private model of national inter-universities cloud based on an open source solution. This private cloud essentially offers storage services and data sharing based on the use of the OwnCloud product. The extension to this product have been defined : the client-side synchronization. Keywords Cloud Computing, data sharing, university, open source, virtualization, synchronization. I. INTRODUCTION Undeniably, the internet technology grows exponentially since its inception. Fairly recently, a new trend has emerged in the field of information technology [1], it is cloud computing. Cloud Computing enable users and developers to use services without knowledge, and without control over IT infrastructures. Because of its increasing popularity, cloud computing is the most popular term in the business and academic environment. It provides a new solution for the academia environment to create a unified, open and flexible offering the possibility of ensuring the confidentiality of data academia. In order to meet the requirements of students, academic globalization, the emergence of technologies and energy issues, academic Cloud aims to establish a private or community Cloud which allows us to offer solutions and services that are both innovative, scalable and suited to today s practice and tomorrow [2]. It also helps to support the development of digital applications, improve management skills and the service offered to users while controlling the energy impact of university facilities. The cloud plays an important role in different areas by providing infrastructure, platform and services that use the principal of virtualization, storage and data synchronization[3]. Different services can be set up in universities with the implementation of Cloud Computing such as: The infrastructure to pool the resources of universities in order to have a common infrastructure.[4]. Wear applications in the world of administration and education to help manage resources. Creating storage and sharing spaces. Collaborative spaces for group work. Environments dedicated to specialized software (virtual machines) A survey was conducted in Algerian universities, the important finding that has been released from this is that Algerian universities do not give their students, teachers and researchers access to services for collaborative work, and are forced to use servers hosted outside the country which do not provide any information about the safety of the shared information. Therefore, we distinguish a need for Algerian universities in the establishment of storage and sharing space to enable the academic community to publish and share information in a very simple manner. This can result in the establishment of a Cloud Computing solution within universities. To illustrate this, we propose a solution suitable for the Algerian academic environment. Our work focuses on designing a Cloud Computing model for Algerian universities. Our approach was to study cloud computing, its concepts and technological solutions available and assess the adoptability of the services offered by the cloud computing, by the academia. Our proposal is based on the conduct of an investigation of the uses and needs of academics to highlight the most requested services. Highlighting the needs will help to propose a suitable cloud model for Algerian universities. II. CLOUD COMPUTING : DEFINTION, MODELS, TYPES A. Definition of Cloud Computing The literature abounds with definitions of Cloud Computing, but the definition of the National Institute of Standards and Technology (NIST) [5] is considered as the base definition : The Cloud Computing is the new way of delivering the IT ressources, and not a new technology. The extended definition given by the NIST is: It is a model that allows an on-demand network access. The resources are shared and the computing power is configurable according to the needs. The client can benefit from high flexibility with minimal management effort. More simply, the cloud refers to Internet technologies, it is often represented schematically by a cloud. This is an abstract concept which combines several technologies to deliver services. Its purpose is to encourage companies to outsource the digital resources they store. These resources offer storage capacity and computing, software, management, and other services are provided by third parties and accessible through an identification system via a personal computer and an Internet connection. We are currently witnessing a growing interest in this new concept of Cloud Computing materialized by increasing research in this area. B. Service Models of Clouds Whatever the definition, cloud computing is generally broken down into three types of services [6]: 95 IT4OD 2014

102 1) Cloud Software as a Service (SaaS): the client has the ability to use applications of the service provider via the network. These applications are accessed via various interfaces, thin client, web browser. et NetSuite are known examples of SaaS providers. 2) Cloud Platform as a Service (PaaS): The consumer may deploy its own cloud infrastructure applications, as far as the provider supports the programming language. Google App Engine is the main actor proposing this kind of infrastructure. 3) Cloud Infrastructure as a Service (IaaS): The customer can rent processing capabilities, storage, network and other computing resources. Amazon Web Service Elastic Compute (EC2) and Secure Storage Service (S3) are the main providers of this type of cloud service. C. Types of Cloud Cloud computing is composed of four practices [6] that embody actually four types of accessibility: 1) Public Cloud: This type of infrastructure is accessible to a wide public and belongs to a provider of cloud services. 2) Private Cloud: The cloud infrastructure works for a single organization. It can be managed by the organization itself (internal Private Cloud) or a third party (external Private Cloud). In this case, the infrastructure is fully dedicated to the company and accessible via secure VPN-type networks. 3) Community Cloud: The infrastructure is shared by several organizations that have common interests. As Private Cloud, it can be managed by the organizations themselves or by a third party. 4) The Hybrid Cloud: Infrastructure consists of two or more clouds (private, community or public) that remain unique entities but are bound together by standardized or proprietary technology, enabling the portability of data or applications. III. A. To an academic Cloud PROPOSAL OF OUR SOLUTION The results of the survey conducted as part of this project, led us to consider a proposal for a Cloud Computing adapted to the needs of the Algerian academic community. One of the main proposed characteristics of the cloud services is to offer students access to services anytime, anywhere while using any device connected to the Internet via a web interface or directly in their explorer, allowing them the use of a development platform on the university infrastructure, to develop their applications. Cloud could provide economic benefits and simplicity of the pool of resources, the performance of storage capacity, and flexibility of a cloud solution. The use of Cloud Computing in the University supposes the existence of some computer elements, such as virtualization intelligence and system robustness [7], offering efficiency, availability, security, retention activity, scalability, and interoperability. An important question arises in the choice of university cloud: what type of cloud should be deployed? Worldwide, the solutions were chosen according to the specificity of the university, by adopting either the private cloud, hybrid or community. The private cloud model seems most appropriate for the academic world in general and specifically for Algerian universities. At this level, it is desirable that the cloud is managed by the University and can only be accessed by students, teachers / teacher-researchers and academic staff. This brings us to implement a private cloud. B. Proposed Cloud model for the Algerian Universities In order to allow Algerian universities to exchange and share information with each other through Cloud technology, Academic Research Network (ARN) [8] must be exploited to establish an inter-university Cloud. To this end, we propose the global model of an inter-university architecture that must support a majority of features of a university Cloud presented above. The architecture of the proposed university Cloud includes a Cloud server at each university; a central server at the CERIST (Center for Research on Scientific and Technical Information) which ensure synchronization between different cloud servers and redirects external users to cloud their university; a redundant server from the central server at Algeria Telecom, which provide the same tasks as the CERIST. The universities Cloud servers indirectly communicate with each other, they pass through the CERIST s server for any exchange and sharing of resources. One of the major problems of cloud computing is the security of the communication and data; this architecture ensures the security of data stored on local and redundant servers on both servers, if a server is not responding (breakdowns,...) the other server takes over. The communication between two entities is secure, it is via a virtual private network (VPN) C. The main features of this model The proposed architecture of academic Cloud aims to make better use university resources to ensure a better quality of training. As a first goal, we find the storage and sharing of data to provide students with an accessible space directly from their operating systems, this therefore requires local synchronization that ensures data consistency. In our case, the storage resources are directly connected to the ARN network to provide a public exchange of information space. The user has the ability to manage its workspace, generating multiple instances of virtual machines from any computer system. It may therefore: attach its storage space to a particular instance, save any work even if a particular case is infected by a virus, the infection remains in this particular case and does not affect other instances, find year-end students projects and research work. As we have already mentioned that the main features offered by a university Cloud are storage and data sharing. That we considered in our proposal of cloud. At this level it is not to reinvent a new technique or a new tool but to use what exists and adapt it to the context of use. As the cloud model 96 IT4OD 2014

103 to implement must support the two main features mentioned above, and we had to adapt an existing tool, we were led to make a comparative study of the main tools for storing and sharing data to choose the most appropriate tool for our needs. D. Comparative study of the main tools (proprietary and open source currently available) The adoption of a cloud architecture requires knowledge of various essential features to educational monitoring. In our study, we noted the importance of storage and data sharing for the community of higher education. To this end, the choice of university Cloud solution will depend on basic services offered by suppliers, as well as performance scalability of the system. In what follows, we present a comparative study on the features offered by the tool storage and sharing of data currently available and are the most used. Owners Cloud Computing Products as Windows Azure [9], Amazon EC2 [2] and Dropbox [10] are mature but not scalable since we can not add their additional features and modules. And, as they are commercial products (payment request), the data in most cases are not hosted in private but at suppliers Cloud. So using these solutions present a safety issue for individuals who possess confidential data. Also, online data can be used by the organization to provide the user with targeted advertisements. On the other hand, the organization providing the service can take ownership rights over user data, like removing those that prove to be troublesome. Eucalyptus [11] is a mature solution that allows the installation of a cloud infrastructure such quite easily. However, the goal of cloud computing is to facilitate the development, Eucalyptus does not allow this. This is why it is now abandoned for other solutions. OpenNebula is another cloud solution that is flexible and mature. Its design is very sleek and leaves a lot of freedom to the administrator who would deploy this solution at the price of an effort to integrate advanced network and other complementary solutions for storage. OpenStack [12] solution is still young but has great potential according to its architecture and community and the support of its partners. It is therefore a solution to monitor because we believe that it can become the reference solutions of open-source source Cloud Computing. OwnCloud [13] is an open-source solution, flexible, scalable and mature. it offers several interesting features using protocols that promote its use. For much more private cloud, many companies and even individuals adopt it. E. Presentation of a OwnCloud OwnCloud is a project of the KDE (K Desk Environment) family under the AGPL license (Affero General Public License) which started in 2010, it aims to be a free alternative to propriety online storage solutions. It allows the use of online storage services such as Amazon S3 or Google, or to simply drop the data within a local server with the advantages of remote storage server (accessible from a computer as well as from a PDA, from home or from outside), but with all the control over private life. OwnCloud is Open Source, this means that individuals have the ability to use OwnCloud without refereeing to the publisher. Participation in the development of OwnCloud, fixes propositions, improvement are possible. OwnCloud uses a web interface to view, share and edit data based on the use of standards such as: WebDAV, CardDAV, CalDAV, Ampache. In order to study the source code of OwnCloud closely, using the method of reverse engineering, we have converted the XMI file using Argouml tool, we have been able to view the necessary diagrams for a proper understanding of the functioning of OwnCloud. F. OwnCloud Extension 1) Synchronizing local client with the server in OwnCloud: The architectural model of academic Cloud proposed in this project will allow users to have an offline access to shared data. Server storage and sharing in this cloud is OwnCloud, which has a lot of features but has some shortcomings, that the most important is the synchronization of local machines with the cloud server. Given a cloud storage server, reachable by machinery, we explain the principle of synchronization as follows [14]: Through the heterogeneous network, the nodes (clients) are connected indirectly to each other. A master node (cloud server) is used to preserve, apply and transfer modified copies by a client. All client nodes are synchronized with the master node to implement indirectly directory synchronization with other nodes, as shown in Fig.1. The heart of the system is the directory synchronization between the client and the master node. Fig. 1. Principle of synchronization in the cloud Depending on how the file changes are detected during synchronization, there are two main methods of synchronizing directories (at this level, [15] a directory is considered as one of the files). One method is to compare the state of the file system with current status of the last synchronization. The other method is to track all user operations to explore all file changes. The proposed synchronization solution combines two methods, and runs in user space. A monitoring program runs continuously to monitor the operations of the user, then the comparison method is used to discover changes. As the method of comparing the file status is adopted in our synchronization program, then a file that keeps the state of the file system after the last synchronization is needed, and is called Image file. After synchronization, the contents of image for each node will be kept compatible with the state of the local file system. The directory structure and files will be synchronized. The clock at the nodes is not synchronized, the creation time, 97 IT4OD 2014

104 last modification date and some other attributes of the files must be synchronized, otherwise it is a source of confusion for the user after synchronization file attributes. Identification and elimination of conflicts is one of the fundamental questions for synchronizing multi-node directories. 2) Synchronization: Principle and working : The proposed synchronization solution ensures consistency and replication of data between the different clients and different server, detecting conflicts caused by simultaneous modification. This solution comes in three phases: 1) Preprocessing phase: It is to verify the existence of changed at the client and server. If the file does not exist, then we copy it directly, otherwise it is a modification and conflict detection necessary. So we appeal to the conflict detection algorithm. 2) Conflict detection phase (Algorithm): This phase is used to detect changes in conflicts at the client node and the server node. 3) Conflicts Fixing: A modification to a node that has not been sent to another node (working offline) creates a conflict. We propose the following solution that will correct the conflict : Client Side Modification Detection The principle of detecting a change of the file at a client may be explained as follows: Create a log directory in the server, move the file version (mirror file) that exists at the client node to the log directory, copy the file version that exists at the server in the client directory (so that the modifications are not definitively lost). Presentation of the conflict detection algorithm The process of conflict detection is always launched after editing a file. Editing a file is detected through the file version (when editing a file, its version is incremented). A file is represented by a triplet [ci, cv, lv ] where : ci : identifier of the node that updates, cv: current version of the file, lv : last version in the server. Two cases may arise for the detection and management of conflicts due to a change of files: Change detection on the server side: The principle of detecting a change in a file at the server can be explained as follows: IV. RELATED WORK: The advantages of using Cloud Computing in academia as well as features minimizing costs are ignored by users. In [16], Bharati Ainapure et al. present a study of the deployment of private Cloud in a technical institute. Thus, the main objective of the prototype presented is: managing the technology needs of universities such as the delivery of software, provision of platform development, data storage and computer effectively. The private cloud is built using eucalyptus technologies. But it does not provide some features that are specific to the environment as university after testing eucalyptus we believe that this is not a fully secure system. In [17], Zhang Tao et al. show that the emergence of cloud 98 IT4OD 2014

105 computing provides a new teaching networking solution. Taking into account the characteristics of cloud computing, they built a combined educational network using Cloud Computing. So far the system has achieved good results. In Heilongjiang, the use of this system continues, and tests are being carried to check the stability and scalability of the system. The E-Learning and Web 2.0 learning have totally changed the education system. Teachers and students can work together online without necessarily being at the university. Education has never been so easy. In [4], Ajith Singh et al. discuss the idea of cloud computing, the problems it tries to address, and its implementation in the university environment. Cloud computing from a computer management perspective, reduces dramatically costs, including management of energy resources, cooling and system management, while driving the use of servers and software licenses, which in turn reduces the purchase needs. Few researches are carried to investigate the academic cloud. Many managers in small firms and universities are not aware of the benefits of cloud computing. IT companies are keen to encourage the adoption of cloud computing education, for example, Google has developed a cloud-based Google Apps for its use in education [18], IBM launched the IBM Cloud Academy which provides a global forum for educators, researchers and professionals in the education industry to pursue cloud computing initiatives, develop skills and share the best ways to reduce costs operations while improving the quality and access to education [19]. In the same way, the private cloud based open-source can be adopted in technical institutes. It does not have to invest in new infrastructure. The university can create instances of the machine on the fly and create an infrastructure as needed. This allows significant savings in annual costs. Progressive Infotech, a leading independent provider of IT infrastructure solutions and services, has implemented a private cloud at the Indian Institute of Technology (IIT) of Delhi [20]. Carnegie Mellon University uses HP Converged Infrastructure approach to build their private cloud network by installing HP Blades, a rise of storage and Virtual Connect. The University of Western Australia has launched a 18-month project to consolidate about 1, 000 disparate servers to a platform in private cloud centers Amcom data [21]. UnivCloud is a concept of community cloud interuniversity, selected by the French government, ensuring sharing of information systems infrastructure to support the hosting services and the development of institutions of higher education and research. 14 member universities of UNPIdF participated in the first study phase of 18 months devoted to the analysis of the needs, the architecture and to the key technology choices. In 2013, a demonstration model will assess the potential of community infrastructure and to begin construction. V. CONCLUSION The paradigm of cloud computing is a new approach to produce a solution to old problems. This paradigm offers many benefits to businesses, industries, and universities. The deployment of private cloud for a university campus has many advantages such as giving each user has their storage space in the cloud and can take advantage of the services offered. By reusing existing IT infrastructure, it is not therefore necessary to invest in additional hardware. Some foreign projects cloud computing have emerged or are in progress for educational use. The main objective of the presented prototype is to manage the technology needs of universities such as storage and sharing of data efficiently, and ensuring data security. The private cloud is built using technologies of OwnCloud, but it does not provide some features that are specific to the needs of the university population as synchronizing the local computer with the cloud server, and server provisionning. Thus, our contribution in this project focused on the proposal of a inter-universities private cloud architecture. It mainly offers private cloud storage services and data sharing based on the use of the product OwnCloud. An extension product was defined: the client-side synchronization (synchronization algorithm). Future work could include: Load balancing at the server level (scheduling algorithm). REFERENCES [1] Ian Foster, Yong Zhao, Ioan Raicu, and Shiyong Lu. Cloud Computing and Grid Computing 360-Degree Compared. In Grid Computing Environments Workshop, GCE &#039;08, pages IEEE, November [2] Tuncay Ercana. Effective use of cloud computing in educational institutions. social and behavioral sciences, pages , [3] Mladen A. Vouk. Cloud Computing - Issues, Research and Implementations. CIT, 16(4): , [4] Ajith Singh N and M. Hemalatha. Cloud Computing for Academic Environment. In International Journal of Information and Communication Technology Research, pages Vol2, No.2, , [5] Peter Mell and Timothy Grance. The NIST Definition of Cloud Computing. Technical Report , National Institute of Standards and Technology (NIST), [6] T. Sridhar. Cloud Computing. The Internet Protocol Journal (IPJ), 12(3):2 19, [7] Mehmet Fatih Erko and Serhat Bahadir Kert. Cloud Computing For Distributed University Campus: A Prototype Suggestion. June [8] Portail ARN. [9] Windows Azure: Plateforme Cloud de Microsoft. [10] Marinela Mircea and Anca Ioana Andreescu. Cloud Computing Universities are Migrating to The Cloud for Functionality and Savings. [11] Daniel Nurmi, Rich Wolski, Chris Grzegorczyk, Graziano Obertelli, Sunil Soman, Lamia Youseff, and Dmitrii Zagorodnov. The eucalyptus open-source cloud-computing system. In In Proceedings of Cloud Computing and Its Applications [Online, page [12] OpenStack Open Source Cloud Computing Software. [13] owncloud. [14] Sundar Balasubramaniam and Benjamin C Pierce. What is a File Synchronizer? In Proceedings of the 4th Annual ACM/IEEE International Conference on Mobile Computing and Networking, MobiCom 98, pages ACM, [15] BAO Xianqiang, XIAO Nong, SHI Weisong, LIU Fang, MAO Huajian, and ZHANG Hang. SyncViews: Toward Consistent User Views in Cloud-based File Synchronization Services. In Chinagrid Conference (ChinaGrid), 2011 Sixth Annual, pages 89 96, [16] Deven N. Shah Bharati Ainapure, Sukhada Bhingarkar. Design of Private Cloud for Educational Sector using Eucalyptus. In nternational Journal of Advances in Computing and Information Researches, pages Vol 1, Issue 1, IT4OD 2014

106 [17] Zhang Tao and Jiao Long. The Research and Application of Network Teaching Platform Based on Cloud Computing. In International journal of Information and Education Technology, pages , [18] Tara S. Behrend, Eric N. Wiebe, Jennifer E. London, and Emily C. Johnson. Cloud computing adoption and usage in community colleges. Behaviour Information Technology, 30(2): , [19] IBM Press++. (2009, November 4). IBM Press Room. Retrieved March 21, 2012, from IBM: [20] Press Coverage, Information Week Magazine. April [21] itnews for Australian Business. UniWA consolidates to private cloud, March IT4OD 2014

107 AFuzzyChoq: A Multi Criteria Decision Method for Classification Problem Salsabil Goucem University of Tebessa, Computer science department, Tebessa, Algeria Saddek Benabied University of Tebessa, Computer science department, Tebessa, Algeria Hakim Bendjenna University of Tebessa, Computer science department, Tebessa, Algeria Abstract: Multi-criteria analysis of decision aid is a major area of research among its main objectives is to facilitate the missions and tasks ranking and arrangement; to achieve this objective there are many multi-criteria decision aid method where each method has strengths and weaknesses; allows the decision maker to express his preferences and his requirements and at the same time it allows to take into account the interaction among the criteria. In other words the existing methods manipulate the criteria as independent but in reality they interact. To bring this goal, that means helping the decision maker to make decisions based on criteria but considers that these criteria are interacting. We present in this paper, a multi-criteria decision aid method for Classification Problem called AFuzzyChoq. AFuzzyChoq is based on: - AHP method to calculate the weights of the criteria wherein each criterion is taken independently of the other, -the fuzzy measure to consider the interaction of criteria and calculate the weight of each subset criterion and - the Choquet integral as an aggregation operator. In order to support and facilitate the use of AFuzzyChoq we tried to develop an appropriate tool to the proposed method. This research also tested and discussed our proposal using an illustrative example of a real selection problem (prioritize candidates of job contest on the basis of the certificate) in Algeria is presented. Keywords: multi-criteria decision, analytical methods multicriteria, interactions of criteria, AHP, Integral Choquet, classification problem. I. INTRODUCTION To take the best decision and determination of the optimum is the desire and purpose of any director and responsible in a direction, a company or any position where decision making is important. Operational research is consistent and meets this challenge with its techniques and methods. But, then this discipline finds problems to do this mission correctly. We recall the most important; the thing known to man is that he takes the subjective side in its decisions and this lack of operational research because it takes only the objective side.operational research mathematically well-posed but does not present well the reality, techniques and methods of operations research work according to one criterion (the rare case of reality) and to optimize an objective function. Another problem the operations research is always seek to one optimal solution or near optimal. Another approach emerges to solve these problems and achieve the main goal is the multi-criteria analysis for decision aid. The multi-criteria decision aid aims, as its name suggests providing tools to make decision him allowing to it to progress in solving a decision problem where multiple views, often contradictory, must be taken into account [1] MCA in fact provide the means of performing complex trade-offs on multiple evaluation criteria, while taking the DM s preferences into account. [2].We use in our process: The AHP method to decompose the problem into a hierarchical structure and calculate the weights of criteria wherein each criterion is taken independently of the other. These weights allows decision makers according to his views to express the importance of each criterion (to express preferences and requirements); Using fuzzy measure to consider the interaction among the criteria and calculate the weight of each subset criterion; Choquet Integral as aggregation operator to aggregate the evaluations of alternatives according to the classification criteria to get an overall evaluation allows the ranking of alternatives. After the Introduction Section, the paper is organized as follows: the subsequent section contains background for the analysis multi-criteria for decision aid, Analytical Hierarchy Process AHP, The fuzzy measure and the Integral Choquet. Next in section 3, the classification method proposed AFuzzyChoq. Later in section 4, an illustrative example of distribution of the new jobs followed by the conclusions and further research directions in sections 5. II. BACKGROUND In order to be as far as possible self-contained, we give in this section necessary introduction and definitions for multicriteria decision making, introduction and definitions for AHP method, Choquet integral and λ-fuzzy measure. A. Multi-criteria decision analysis 1) Definitions According to Bernard Roy: The decision support is the activity of one who, building on models clearly explained but not necessarily clearly formalized, helps to get some answers to questions that arose involved in a decision process, elements intersecting at inform the decision to prescribe and normally, 101 IT4OD 2014

108 or simply to promote, conduct likely to increase the coherence between the evolution of a process on the one hand, the objectives and value system in whose service this speaker is placed to On the other hand [3] Also the multi-criteria analysis problem can be defined [4] as (AL,f), where АL is a finite feasible set of alternatives AL ={ a 1, a 2,... a n }and f is a vector-valued (k -dimensional) criterion f ={ f 1, f 2,... f n } For an alternative a AL, f j (a), j {1,..., k}, represents the evaluation of the j-th criterion. Each criterion f j is assumed to be either maximized max a AL fj (a) or minimized min a AL f j (a). The notation a ij = fj( a i ) may also be used to denote the j-th criterion value for alternative a i. The matrix A={a ij }, i=1,...,n; j = 1,...k is denoted as a decision matrix. 2) Methods of Multi-criteria Analysis The field of multiple criteria decision analysis offers the Decision Maker (DM) a selection of methods and operational tools that explicitly account for the diversity of the viewpoints considered. The common purpose of these methods is to evaluate and choose among alternatives based on multiple criteria using systematic analysis that overcomes the limitations of unstructured individual or group decision making [5], [6]. Many MCDM methods have been proposed in the literature. Unfortunately, there is no method for choosing among them the most appropriate for a given decision problem, the choice remaining a subjective task. Furthermore, each method may product different rankings [7]. Given these drawbacks, it is suggested the use of more than one MCDM method in order to enhance the selection process. [8] The following main categories of problems are considered on the basis of MCDA [6]: Sorting alternatives into classes/categories (e.g., unacceptable, possibly acceptable, definitely acceptable, and so on); Screening alternatives a process of eliminating those alternatives that do not appear to warrant further attention, that is, selecting a smaller set of alternatives that (very likely) contains the best /trade-off alternative; Ranking alternatives (from best to worst according to a chosen algorithm); Selecting the best alternative from a given set of alternatives; and Designing (searching, identifying, creating) a new action/ alternative to meet goals. The main families of methodologies include: 1. Outranking methods, such as the Elimination and choice Reflecting the Reality (ELECTRE) family [9], [10], the Preference Ranking Organization Method for Enrichment Evaluation (PROMETHEE) I and II methods [11], and Regime Method Analysis [12]; 2. Value or utility function-based methods, such as the Multi-Attribute Utility Theory (MAUT) [13]; the Simple Multi-Attribute Rated Technique (SMART) [5]; the Analytic Hierarchy Process (AHP)[14]; and the most elementary multicriteria technique, the Simple Additive Weighting (SAW); and 3. Other methods like Novel Approach to Imprecise Assessment and Decision Environment (NAIADE) [15], Flag Model [16], Stochastic Multi-objective Acceptability Analysis (SMAA) [16]. Each method constructs first a m odel of decision makers' preferences and then exploits this model in order to work out a recommendation. B. The analytical hierarchy process method The Analytical Hierarchy Process (AHP) developed by Thomas Saaty is probably the best known and most widely used MCA approach. [17][14][18].AHP helps capture both subjective and objective evaluation measures, providing a useful mechanism for checking the consistency of the evaluations thus reducing bias in decision making [19].AHP is a rigorous methodology which is divided into a series of important steps: construction of the hierarchy, priority setting and checking the logical consistency of the analysis. [20]. The AHP method can be described in 5 steps [21]: Step 1: Problem definition. In this step we define the problem goal (objective). Step 2: Graphical representation of the problem. This step is divided into three sub-steps: state the objective (according to goal encountered in step 1), define the criteria and choose the alternatives. Step 3: Establish priorities. Pair wise comparisons are used to estimate the ratio of relative importance of two alternatives in terms of the criteria. To accomplish this, Saaty uses the scale [1..9] to rate the relative importance of one criterion over another, and constructs a matrix of the pair wise comparison ratings. If a criterion is less important than the other, then the inverse preference is rated in the scale 1, 1/2, 1/3,, 1/9 [14]. Step 4. Synthesis. In this step we calculate the priority of each alternative solution and criteria being compared. Several mathematical procedures can be used for the synthesis, for example eigenvalues and eigenvectors or a simple averaging [14]. Step 5. Consistency. The quality of the decisions is guaranteed through a measure of consistency of pair wise comparison judgments by computing a consistency ratio (CR): CR=CI\RCI. (1) Where: CI represents the consistency index; RCI gives the random consistency index. RCI is an average random index derived from a sample of size 500 of randomly generated reciprocal matrices [22] and depends on the number of elements being compared (see Table 1). CI is calculated using: CI=(ƛ max -n)\(n-1). (2) 102 IT4OD 2014

109 Where: n is the number of items or alternatives being compared; ƛ max is calculated using: A.W\W=CV. (3) Where: A represents the pair wise comparison matrix; W the priority vector of alternatives with respect to a certain criteria. And CV the consistency vector. CR is designed in such a way that values of the ratio exceeding 10% are indicative of inconsistent judgement; in such cases, the decision maker would probably wants to revise the original values in the pair wise comparison matrix [22]. TABLE I. RCI values per number of compared items [22] N RCI C. Interacting criteria, fuzzy measure and Choquet integral A common method to evaluate criteria set is to use an aggregator operator to reduce the multi-criteria problem into a single global criterion problem by aggregating all the elements [23]. A traditional method uses a weighted sum (or weighted mean) [24]. This is a simple approach; however, despite its simplicity it has a drawback in that it assumes that the criteria are independent. If the criteria can interact with each other, the weighting factor must be replaced by a set function on set of criteria, which not only considers the weighting factor on each criterion but also weighting on each subset of criteria [23]. Reference [25] gives an overview of different types of interaction among criteria that could exist in the decision making problem. Three kinds of interaction defined and described in [25] are as follows: correlation, complementary, and preferential dependency. For example, consider the problem of evaluating a given car based on three criteria {fuel efficiency, luxury, and price}. A highly luxurious car generally comes with a higher cost. In this case, luxury and price form positive correlating criteria, and the evaluation will be an overestimate [23]. Clearly, when such complex interactions exist among criteria, it is necessary to use a w ell-defined weighting function on a subset of criteria rather than a single criterion during global evaluation. One such methodology for evaluation is the Choquet integral with the use of fuzzy measure as a weighting function. Choquet Integral was introduced by Choquet (1954) based in the theory of Fuzzy measures. The use of the Choquet integral to decision support systems on several industrial applications has been proposed by many authors [26]. In comparison to other multi-criteria analyses techniques, Choquet integral has two characteristics [27]: It extends the weighted approach of other techniques to a fuzzy set of criteria by taking into account interactions between them; It allows expressing the importance degree of a combination of criteria in a unique index. As mentioned in [27] the multi-criteria model is composed of 3 main components: The criteria formalization is realized thanks to utility functions and it allows establishing the index measure between the criteria. The fuzzy measure is a set function that is build in order to model importance and interaction between criteria. Finally the Choquet integral establishes the overall evaluation. It computes a kind of average value of the utilities taking into account the importance and interactions between the criteria. In the rest of this paper, we will work on Χ a finite universe of n elements (criteria). The λ-fuzzy measure the best example for fuzzy measure is the tool we used to express the importance and the interactions between two or more criteria. It is a set function that defines a weight for each subset P(X) of the set of the criteria. C1.The fuzzy measure The fuzzy measures are able to model the dependence between criteria in many situations, whatever the nature of the dependence. Indeed, these measures have been proposed by Sugen. J. Moscarola for generalized additive measures. [28] A fuzzy measure of the set X is a function g :2 X -> [0,1] such that the following condition are satisfied 1) g(φ ) = 0 2) g(x) =1 (1) and (2) is also called boundary condition 3 If A Β X then g(a) g(b) this property is called monotonicity Where g(a) is indicates the weights of importance for a set A. A fuzzy measure is called additive if g(a B) = g(a) + g(b) whenever A B =φ,super additive if g(a B) g(a) + g(b) whenever A B =φ and sub additive if g(a B) g(a) + g(b)whenever A B =φ. C2.Definition of Sugeno λ -Fuzzy Measure [28] Let X = {x 1, x 2, x 3,.x n } be a f inite set and consider λ (-1, ), an λ -measure is a function g λ : X 2 ->[0,1] such that it satisfied the following condition: 1) g λ (X) = 1 (4) 2) If A,B 2 X then g λ (A U B) = g λ (A)+ g λ (B)+ λ g λ(a)g λ (B) With A B= φ (5) Moreover, let X be a finite set, X = {x 1, x 2, x 3,.x n } and P(X) be the class of all subsets of X the fuzzy measure g(x)=g λ ({x1, x2, x3,.xn}) can be formulated as g λ ({x1, x2, x3,. xn}) = nn gg ii + λ g i1. g i2 + + λ n 1 g 1 g 2 ii=1 n 1 n i1=1 i2=i1+1 g n = 1 [ nn (1 + λg λ ii=1 i) 1] wwheeeeee λ ( 1, ) (6) nn g λ ({x1, x2, x3,. xn})λ = ii=1 (1 + λg i ) 1 (7) 103 IT4OD 2014

110 nn g λ ({x1, x2, x3,. xn})λ + 1 = ii=1 (1 + λg i ) (8) bbbb dddddddddddddddddddd g λ ({x1, x2, x3,. xn}) = 1 nn λ + 1 = ii=1 (λg i + 1) (9) Now we evaluate the value of λ.according to the fundamental theorem regarding the λ-fuzzy measure, λ-value has three cases, as follows: nn (ii)iiii ii=1 gg ii > gg(xx) ttheeee 1 < λ < 0. (10) nn (iiii)iiii ii=1 gg ii = gg(xx) ttheeee λ = 0. (11) nn (iiiiii)iiii ii=1 gg ii < gg(xx) ttheeee λ > 0. (12) simplicity of using the AHP method and the possibility of measuring the consistency of the solution reached by the decision maker and makes a r eassessment if necessary. We emphasize here, that we did use the AHP method to enable decision makers to determine the importance of each criterion relative to others, to get the weight of each criterion. Thus, we perform here only binary comparisons between a limited numbers of criteria and reduced without making binary comparisons between alternatives. At the end of this step there is obtained the importance (weight) of each criterion independently of the other ie that is not obtained the importance of two or more criteria (the subset of criteria) which has the interaction among the criteria. C3.Choquet integral [28] Finally the aggregation function establishes the overall evaluation of alternatives. We use the Choquet integral that computes an average value of the utilities taking account of the interaction and the importance of the criteria. This integral is considered as an aggregation operator. Let us suppose that g be a f uzzy measure on X, then Choquet integral of a function :f :x->[0, ] w.r.t fuzzy measure g is defined (cc) ff dddd = nn ii=1 ff(xx ii ) ff(xx ii 1 ) gg(aa ii ) Where A i X (13) for i=1,2,3, n.{f(x1),f(x 2 ),..} : are the ranges and they are defined as : III. f(x1) f(x 2 ) f(x 3 ),. f(x n ) et f(x 0 )=0. (14) AFuzzyChoq : A MCDA FOR CLASSIFICATION PROBLEM We address the problem type "classification γ". We consider a problem with a set of alternatives: A = {A1,A2,, An}, which are evaluated according to a set of C criteria = {C1, C2, Cj}. We propose a Method Multi-criteria for aid decision AFuzzyChoq for ranking alternatives that can in both : take into account the interactions of criteria ie allows the decision maker to express the importance of each criterion independent of the other as its importance relative to other criteria which represents the interaction among the criteria this the case in reality; and allows the intervention of the decision maker's preference. As depicted in Figure 1 AFuzzyChoq consists of three phases. Phase 1: Apply the AHP method to decompose the problem into a hierarchical structure and calculate the weights of criteria wherein each criterion is taken independently of the other. These weights allow decision makers according to his views to express the importance of each criterion (to express preferences and requirements). This choice is motivated by the Figure 1. AFuzzyChoq steps Phase 2: Use the λ -fuzzy measure for considering the criteria interaction and calculating the weight of each sub-set of criterion. This technique is most often used to answer this question. At the end of this step there is obtains the importance of two or more criteria (the subset of criteria) which shows the interaction among the criteria. Phase 3: Use the Choquet integral as an aggregation operator based on evaluation of alternatives on each criterion and the weight of each subset of criteria to aggregate the evaluations of alternatives according to the classification criteria and its interactions to get an overall evaluation allows the ranking of alternatives. At the end of this step there is obtained a total score that alternatives are classified according to this score. IV. ILLUSTRATIVE EXAMPLE The previous sections presented our process AHP-Fuzzy- Choq and discussed advantages when using it. This section aims to operationalize it with a real-world system by using an example issued from the distribution of the new jobs in Algeria. The empirical example background, the problem 104 IT4OD 2014

111 statement and the results of applying this process are discussed below. A. Background and problem statement The equitable distribution of new jobs to meet the candidates of job contest on the basis of the certificate is difficult and embarrassing for the selection committee. This is evidenced by conciliation between the number of new jobs available and the number of applicants who always are the most and satisfy the greatest possible number of candidates. In evaluating candidates the managers based on rating scale of the public function by the following four criteria: the relevance and suitability of the candidate's qualifications with job requirements requested to enter the contest, the experience acquired by the candidate in the same or an equivalent job, date of graduation and the results of the interview with the selection committee. The objective is to provide an overall rating of ranking to candidates allowing them sorted in descending order. Next we considered only the n first candidates corresponding to the n available new jobs. B. Candidates prioritization Analytical multi-criteria methods for decision support are available to facilitate this mission as much possible. We apply AFuzzyChoq to classify the candidates need job according to criteria defined by the public function and to select a number of candidates equals the number of new jobs available. Now, we can formulate the above problem as a multi-criteria decision analysis problem by defining the set of: criteria, alternatives, and the goal as follows: Criteria: X={Cr1: the relevance of the candidate's qualifications, Cr2: the experience acquired, Cr3: date of graduation, Cr4: the results of the interview} set of criteria. Their weights are respectively w1, w2, w3, w4 where wiϵ]0,1[alternatives: W= {c1,c2, cm} set of candidates on which criteria is to be evaluated. And Goal: Evaluate the set of candidates/alternatives {c1, c2,. cm} based on set of criteria X. The following table (Table V) shows the list of candidates with its evaluations for each criterion. Then, we pass to the step of determining the weight of each criterion where the evaluation each candidate against each criterion it is necessary to know the importance of these criteria which is expressed by its weight. The selection committee (ie policy makers) are most likely to be able to know which are the most important criteria than others, the criteria the same importance, in short, that can compare between the criteria. To clarify these importance in perspective of policymakers, we must allow policymakers express it. To achieve this goal, we chose the AHP method (Phase 1) to calculate the criteria weights. For example we have four candidates that we want to rank for two jobs. The selection committee performs a b inary comparison between criteria, to get the weight of each criterion. The selection committee may make this comparison between the criteria according to his point of view for express the criteria that are most important and quite important, equal criteria and so on. And we finish the calculation of the matrix of importance according to the AHP method. The following figure (figure3) shows steps of calculating the weight of criteria. Figure 2. Hierarchy form of the problem TABLE II. Evaluations and scores of candidates Candidate Cr1 Cr2 Cr3 Cr4 Alli Madi Farid drid Mossa Djaba Yahia Batouri Hichem Mahdi As mentioned earlier, according to our process, there are three steps to prioritize the candidates. The first step allows us to obtaining the hierarchical form of the problem and the weights of each criterion. Thus for obtain the hierarchical form of the problem (Figure 2); we use the AHP method (phase 1); we find that this method allows decomposing the problem of distribution of social housing in the hierarchical form next: In level 0 we have the goal that is the ranking of candidates; Criteria occupy the level 1 finally; Applicants occupy the lowest level of the hierarchy (level 2). The goal then is to rank applicants based on their assessments on criteria established by the public function. 105 IT4OD 2014

112 2). According to the aforementioned definition we begin by calculating the value of λ according to the equation (9) 4 λ + 1 = i=1 (λw i + 1) λ+1=(0,44 λ+1) (0,28 λ+1) (0,16 λ+1) (0,10 λ+1) λ+1=0,001 λ 4 +0,04 λ 3 +0,31 λ 2-0,02 λ =0 Figure 3. steps of calculating the weight of criteria Also in Phase 1 after calculating criteria weights the degree of coherence is verified by calculating a ratio of coherence (RC <0.10 or not). According to the formulas (1), (2), (3) there is the previously mentioned RC = <0.10 this demonstrating that preferences are chosen correctly. 0,462 0,522 0,461 0,231 0,621 0,307 0,44 + 0,28 + 0,16 + 0,10 0,152 0,130 0,153 0,152 0,086 0,333 0,457 0,333 0,256 = 0,222 0,95 0,076 0,111 0,113 CCCC1 = 0,457 0,44 = 1,038 CCCC2 = 0,256 0,28 = 0,914 CCCC3 = 0,95 0,16 = 5,937 CCCC4 = 0,113 0,10 = 1,13 1, , , ,13 λ mmmmmm = = 2,25 4 IIII = λ mmmmmm = 0,58 RRRR = 0,58 = 0,64 < 0,10 0,90 Now we move to the next step, we calculate the importance of each subset of criteria to take into consideration the interaction among the criteria), using λ-fuzzy measure (Phase We solved this equation and we selected λ [-1, [ λ ={-29,45 ;-10,61 ;0 ;0,063}=> λ =0,063. Now according to the equations (4),(5); the values of fuzzy measures are as follows: µ1{cr1}= W1=0,44. µ2{cr2}=w2=0,28. µ3{cr3}=w3=0,16. µ4{cr4}=w4=0,10. µ12{cr1,cr2}=µ1+µ2+λµ1µ2=0,44+0,28+0,063*0,44*0,2 8=0,687. µ13{cr1,cr3}= µ1+µ3+λµ1µ3=0,60 µ14{cr1,cr4}= µ1+µ4+λµ1µ4=0,54 µ23{cr2,cr3}= µ2+µ3+λµ2µ3=0,44 µ24{cr2,cr4}= µ2+µ4+λµ2µ4=0,38 µ34{cr3,cr4}= µ3+µ4+λµ3µ4=0,26 µ123{cr1,cr2,cr3}= µ1+µ2+µ3 +λµ1µ2µ3=0,88 µ124{cr1,cr2,cr4}= µ1+µ2+µ4 +λµ1µ2µ4=0,82 µ134{cr1,cr3,cr4}= µ1+µ3+µ4 +λµ1µ3µ4=0,70 µ234{cr2,cr3,cr4}= µ2+µ3+µ4 +λµ2µ3µ4=0,54 µ1234{cr1,cr2 Cr3,Cr4 }=1(By definition) So we must aggregate the evaluations of each applicant in order to give each an overall score that allows ranking. To this aggregation of assessments of applicants according to specific criteria, we apply Choquet integral (Phase 3). We get the list of candidates, everyone with its overall score which allows ranking them. According to the equations (13),(14); the values of Choquet integral which presents the overall evaluation of each candidate according to the importance of each criterion and at the same time takes into account the interactions among the criteria (the importance of subsets of criteria) are as follows: SC1=E1(C1)µ1234{Cr1,Cr2Cr3,Cr4}+(E2(C1)- E1(C1))µ234{Cr2,Cr3,Cr4}+(E3(C1)- E2(C1))µ34{Cr3,Cr4}+(E4(C1)-E3(C1))µ4{Cr4} SC1=22+(30-20)(0,54)+(40-30)(0,26)+(50-40)(0,10) SC1=31(SC1: the overall score of the first candidate,e1(c1): the evaluating candidate 1 according to criterion 1) So on: SC2= 19, IT4OD 2014

113 SC3=41,36. SC4=24,6. SC5=25,56. Now according to these overall scores the candidates are ranked. The following table (Table III) shows the list of candidates with its evaluations for each criterion and its overall score. The ranking TABLE III. The ranking and scores of candidates Candidates Score 1 Mossa Djaba 41,36 2 Alli Madi 31 3 Hichem Mahdi 25,6 4 Yahia Batouri 24,6 5 Farid drid 19,32 Therefore the two candidates that won the two new jobs are: Mossa Djaba and Alli Madi V. CONCLUSION AND PERSPECTIVES Multi-criteria analysis of decision aid is a very important area of research. In fact many decision problems are multicriteria. The arrangement problem (or classification) is among the most frequently encountered problems in our lives. For this the purpose of this article is to propose a method AFuzzyChoq based on the principles and methods of multi-criteria analysis to solve this type of problem and discuss our proposal using an illustrative example of a real selection problem (prioritize candidates of job contest on the basis of the certificate) in Algeria is presented. AFuzzyChoq uses: AHP method to give weights to the criteria; the fuzzy measure to consider interactions and dependencies between criteria and Choquet integral as an aggregation operator to compute global priorities In order to support and facilitate the use of AFuzzyChoq we plan to develop an appropriate tool to support the proposed method and also apply and test it with a case study of a real selection problem (prioritize demanders of social housing) in the Wilaya of Tebessa in Algeria. REFERENCES [1] P. Vincke.( 1989). L'aide multicritère à la décision. Editions de l'université de Bruxelles, Bruxelles. [2] Rossi F, Tsoukias A (eds). (2009). Algorithmic decision theory. Proceedings of the First International Conference, ADT 2009, Venice, Springer Verlag: Berlin [3] B. Roy. (1985). Méthodologie multicritère d'aide à la décision. Economica X, Paris. [4] Hanne, T., H. L. Trinkaus. (2003). knowcube for MCDM Visual and Interactive Support for Multicriteria Decision Making, Published reports of the Fraunhofer ITWM, 50, [5] von Winterfeldt D and Edwards W. (1986) "Decision analysis and behavioral research". Cambridge (UK): Cambridge University Press, pp [6] Belton V, Steward T (2002) "Multiple criteria decision analysis: An integrated approach". Dordrecht (NL), Kluwer Academic. P.372. [7] Mahmoud, M.R. and Garcia, L.A. (2000), Comparison of different multicriteria evaluation methods for the Red Bluff diversion dam.,environmental Modelling & Software 15, [8] Duckstein, L., Treichel, W. and Magnouni, S.E. (1994),.Ranking ground-water management alternatives by multicriterion analysis., Journal of Water Resources Planning Management, ASCE 120, ; [9] Roy, B. and Vincke Ph. (1981) "Multicriteria analysis: Survey and new directions". European Journal of Operational Research 8: [10] Vincke P. (1992) "Multicriteria decision aid". New York: Wiley [11] Brans J. P. and Vincke Ph. (1985) "A preference ranging organization method. The PROMETHEE method for MCDM". Management Science 31: [12] Nijkamp, P., Rietvelt, P., and Voogd, H.(1990) "Multi-criteria evaluation in physical planning". Amsterdam: North-Holland,. [13] Keeney R. and Raiffa H. (1976) "Decision with multiple objectives: Preferences and value tradeoffs". New York: John Wiley & Sons. [14] Saaty T.L. (1980) "The Analytic Hierarchy Process", Network: McGraw- Hill. [15] Munda, G. (1995) "Multicriteria evaluation in a fuzzy environment. Theory and applications". In Ecol. Econ. Physica. Heidelberg: Verlag. [16] Nijkamp, P., and Vreeker R. (2000) "Sustainability assessment of development scenarios: Methodology and application to Thailand". Ecological Economics 33:7 27. [17] Saaty,T.L.(1982). Decision Making for Leaders,Lifetime Learning Publications. Wadsworth,Belmon t. [18] Saaty,T.L.(1995). Decision Making for Leaders, The Analytic Hierarchy Process for Decisions in a Complex World. RWS Publications,Pittsburgh. [19] Ishizaka Alessio and Lusti Markus. (2003). An Intelligent Tutoring System for AHP, Proceedings of the 9th International Conference on Operational Research KOI 2002, Trogir, [20] Saaty, T. L. (1984). Décider face à la complexité : une approche analytique multicritère d aide à la décision. Entreprise moderne d édition, Paris. [21] F. Vieira, I. S. Brito and A. Moreira: "Using multi-criteria analysis to handle conflicts during composition". [22] Triantaphyllou, E.: "Multi-Criteria Decision Making Methods: A Comparative Study". Kluwer Academic Publishers, [23] Sridhar P., M. Madni A., and Jamshidi M., "Multi-criteria decision making in sensor network", IEEE Instrumentation & Measurement Magazine, PP [24] Grabisch, M. :"The application of fuzzy integrals in multicriteria decision making", European Journal of Operational Research 89 (3) (1996) [25] Marichal, J.-L.: An axiomatic approach of the discrete choquet integral as a tool to aggregate interacting criteria IEEE Trans.Fuzzy Systems, vol. 8, pp , Dec [26] Grabisch M. and Roubens M.: "Application of the Choquet integral in multicriteria decision making". In M. Grabisch, T. Murofushi, and M. Sugeno, editors,: "Fuzzy Measures and Integrals - Theory and Applications", pages Physica Verlag, [27] Grabisch M., Kojadinovic I., and Meyer, P.:"A review of capacity identification methods for Choquet integral based multi-attribute utility theory: Applications of the Kappalab R package". European Journal of Operational Research, [28] Muhammad Ayub. Choquet and Sugeno Integrals. Blekinge Institute of Technology.(2009). 107 IT4OD 2014

114 Topic 4: Networks and Embedded Systems 108 IT4OD 2014

115 Personnalisation d accès aux sources de données hétérogènes pour l organisation des grands systèmes d information d entreprise Aïcha AGGOUNE Département d informatique Laboratoire LabSTIC, Université 8 mai 45 Guelma, Algérie Abdelkrim BOURAMOUL Département De L informatique Fondamentale Et Ses Applications Laboratoire MISC, Université Constantine 2, Algérie Mohamed khiereddine KHOLLADI Recteur de l université d El-Oued Laboratoire MISC, Université Constantine 2 Algérie Résumé- L évolution des systèmes d information d entreprise SIE, engendre une explosion du nombre de sources de données hétérogènes qui à son tour rend l accès à l information pertinente un problème fastidieux. Notre travail cherche à déterminer quelles sources sont pertinentes aux usages et aux clients d entreprise, connaissant le contexte dans lequel ils sont immergés. Nous présentons dans cet article une architecture de médiation personnalisée pour l adaptation d un nombre réduit de sources d information à la structure du profil utilisateur. A cet effet, les modélisations à base d ontologie de profil utilisateur et des sources d informations ont été réalisées. A partir de ces deux modèles sémantiques, une opération de filtrage automatique des sources pertinentes est faite par l appariement entre les concepts de modèle du profil utilisateur et celui des sources d information. Les sources pertinentes sont par la suite utilisées dans le système de médiation comme des sources locales personnalisables qui facilite la recherche d information. Mots clés : Personnalisation, sources de données hétérogènes, organisation des SIE, système de médiation, ontologie I. INTRODUCTION Retourner à l utilisateur des informations pertinentes issues des sources hétérogènes constitue un enjeu majeur pour l industrie informatique. Que ce soit dans le contexte des systèmes d information d entreprise, du commerce électronique, de l accès au savoir et aux connaissances, la pertinence de l information délivrée, son intelligibilité et son adaptation aux usages et préférences des clients constituent des facteurs clés du succès ou du rejet de ces systèmes. La personnalisation de recherche d information s avère nécessaire, elle consiste à modéliser l utilisateur et l intégration du profil résultant dans le processus d accès à l information. Ainsi la pertinence de l'information dépend de l'adéquation entre la requête et l ensemble des éléments constituant le profil qui sont perceptibles lors de la recherche. L objectif de cet article est multiple d une part, d utiliser un système de médiation dans un grand système d entreprise pour construire une vue uniforme de données hétérogènes permettant de donner l impression à l utilisateur qu il recherche dans un système homogène et centralisé. Et d autre part, d intégrer les différentes connaissances constituant un profil dans le processus du système de médiation et d établir un élément de référence pour filtrer la masse importante des sources d information d une entreprise. Cet élément de référence se représente sous forme d un modèle générique à base d ontologie décrivant la typologie des informations nécessaires à la définition d un profil. Une plateforme de gestion du profil utilisateur à été développé pour l utiliser par la suite dans le système de médiation. De plus, une architecture du système de médiation personnalisé a été réalisée dont le but de faciliter la recherche d information dans les grands systèmes d entreprise tel que SONATRACH. Ainsi qu une méthode de filtrage automatique des sources d informations à intégrées dans le système de médiation a été présenté. L article est composé de 5 sections, dans la première nous présentons l introduction générale, ensuite dans la deuxième et la troisième section nous donnons quelques travaux similaires liés d une part au traitement de l hétérogénéité sémantique et à la construction de profil utilisateur d autre part. En section quatre, nous décrivons notre architecture de médiation dont le but d adapter les sources pertinentes au profil utilisateur des grands systèmes d information d entreprise. Enfin, en section cinq nous terminons par une conclusion générale et des perspectives pour les prochains travaux. II. TRAVAUX SIMILAIRES Afin d exploiter plusieurs sources d informations hétérogènes en recherche d information RI, de nombreux travaux s intéressent à l intégration de ces sources. Deux catégories d approches ont été dégagées ; la première appelée les approches virtuelles, cette catégorie est basée sur l utilisation des systèmes de médiation et la seconde catégorie appelée les approches matérialisées, elle est fondée sur la construction d entrepôt de données [2]. Le principe de la première catégorie d approches consiste à construire un système de médiation définissant une interface entre l utilisateur et l ensemble des sources accessibles via le Web [3]. Au début, ce type de système provient du monde des 109 IT4OD 2014

116 bases de données fédérées, il consiste à définir le schéma global en fonction des schémas des sources de données à intégrer [4]. Le médiateur comprend un schéma global, ou ontologie globale qui fournit un vocabulaire structuré servant de support à l expression des requêtes. Les sources d information pertinentes, pour répondre à une requête, sont calculées par réécriture de la requête en termes de vues. Ces requêtes sont généralement traduites par des adaptateurs ou wrappers en requêtes écrites dans le langage spécifique pour chaque source [5], [1]. La personnalisation dans un système de médiation doit tenir compte non seulement du profil des utilisateurs mais aussi de la description sémantique des sources de données, définie par des requêtes de médiation [23]. [24] proposent une approche de médiation personnalisée au niveau de la reformulation de requêtes qui consiste à réinterpréter l'intention de l'utilisateur, exprimée dans sa requête, en une requête plus complète, tenant compte à la fois du profil utilisateur et de la description des sources de données. La seconde catégorie d approches fondée sur la construction d un entrepôt de données (ou Data Warehouse en anglais). Elle consiste à voir l intégration comme la construction de bases de données provenant des sources d information hétérogènes et distribués [6]. Ces données doivent être organisées, coordonnées, intégrées et stockées pour donner à l utilisateur une vue globale des informations contrairement à la première catégorie d approches où les données sont stockées dans leur source d origine [7] [8]. Enfin, ce type de système permet aux utilisateurs de poser directement leurs requêtes sur cet entrepôt de données [9]. La personnalisation dans les entrepôts de données permet de modifier les structures multidimensionnelles et les mécanismes de présentation des données par une meilleure connaissance des utilisateurs [25]. Néanmoins, la prise en compte du profil utilisateur d une manière implicite est une problématique. [10] proposent une approche hybride combinant l approche de médiateur pour l intégration des sources externes et l approche d entrepôt de données pour l intégration de leurs données. Cette approche hybride porte d une part sur la génération automatisée de mises en correspondance, ou mappings, entre l ontologie et une nouvelle source à intégrer, d autre part sur la construction automatique d adaptateurs allant de la description du contenu abstrait de cette nouvelle source jusqu à l extraction des données. De telles approches nécessitent des opérations fondamentales pour traiter le problème d hétérogénéité où nous trouvons dans les approches virtuelles, l opération d adaptation de requêtes du schéma global aux schémas locaux et dans les approches matérialisées, l opération de construction d entrepôt via des sources hétérogènes, qui est un travail fastidieux ainsi que leur données ne sont pas toujours fraîches. Tous ces opérations sont caractérisées par la difficulté de mise en œuvre qui conduit à l allongement du leur temps et l augmentation du leur coût. Ainsi que la pauvreté de générer des liens sémantiques et l'adaptation de l'accès aux besoins des utilisateurs. Une étude comparative entre ces deux approches d intégration nous a permis de construire le tableau suivant. TABLE I. COMPARAISON ENTRE LES APPROCHES D INTEGRATION Approches virtuelles Approches matérialisées Principe Interface unifiée Copie des sources Actualisation temps Données historisées et Données fraîches réal non volatiles Performance Défis principal Bonne Opérations sur les données Opérations sur les sources Opérations sur les requêtes Rien Accès aux sources Adaptation aux schémas locaux, fragmentation Adaptation au schéma global Extraction, transformation, Nettoyage, filtrage Alimentation des entrepôts et accès direct à l entrepôt Rien Opérations sur les résultats Rien Evolution Rien Souvent et largement Type de requêtes Simple mais Complexes et coûteuses transactions longues Personnalisation Reformulation de Données requête multidimensionnelles Complexité Maximale Minimale Dans ce sens, nous avons situé dans la problématique de personnalisation implicite des systèmes de médiations, dans le cadre de traitement d hétérogénéité des sources d informations dans des grands systèmes d entreprise. De ce fait, le deuxième point à explorer est de présenter une synthèse obtenue après une étude approfondie sur les différentes techniques de construction et de gestion de profil utilisateur. III. APPROCHES DE MODELISATION DE PROFIL UTILISATEUR Des études montrent que la faille des systèmes de recherche d information SRI classique réside en partie dans le fait qu ils considèrent que le besoin en information de l utilisateur est complètement représenté par sa requête et ne tiennent pas en compte l utilisateur dans la chaîne d accès à l information. Pour cela, les travaux sont orientés vers la conception d une nouvelle génération de moteurs de recherche basée sur la RI contextuelle dont l objectif est de délivrer de l information pertinente et appropriée au contexte de l utilisateur qui a émis la requête [11] [12]. La RI personnalisée est un type de RI contextuelle où l accent est mis sur l utilisation d un modèle utilisateur préalablement construit appelé profil [13]. La modélisation du profil utilisateur est une tâche centrale dans le processus de RI, elle nécessite, d une part, de représenter les centres d intérêts de l utilisateur dans le système et, d autre part, d adapter cette représentation aux changements des centres d intérêts de l utilisateur au cours du temps. 110 IT4OD 2014

117 De nombreuses approches de modélisation du profil de l utilisateur ont été proposées dans la littérature. Il existe quatre grandes classes d approches : Les approches basées sur l historique de recherche : dans cette classe d approches, le profil est représenté par requêtes et l ensemble des pages Web visités par l utilisateur. Raghavan et Sever [14] représente un profil de l historique de recherche servant comme une base de données des requêtes précédemment exécutées ainsi que leurs résultats associés. Cette base de données est utilisée pour sélectionner les requêtes les plus similaires à une requête en cours d évaluation. Les approches ensemblistes : sont basées sur une représentation vectorielle par mots clés pondérés des centres d intérêts de l utilisateur. La construction d un profil ensembliste repose sur des techniques d extraction des termes à partir des documents pertinents, jugés implicitement ou explicitement par l utilisateur. Plusieurs systèmes d accès personnalisé à l information adoptent ce type de représentation, à savoir : MyYahoo, InfoQuest, Letizia [15]. Les approches connexionnistes : cette classe d approches permet de résoudre les problématiques liées aux approches ensemblistes en enrichissant le profil par des relations sémantiques entre les centres d intérêts le constituant. Cette représentation sémantique est effectuée par la construction d un réseau de concepts pondérés traduisant un centre d intérêts de l utilisateur [16]. Les approches connexionnistes déterminent les relations de corrélation sémantiques entre les centres d intérêts du profil, ce qui n est pas le cas dans les approches précédentes. Les approches sémantiques: La construction d un profil utilisateur sémantique repose principalement sur l utilisation de deux sources d évidence, les données utilisateurs collectées au cours de ses sessions de recherche et des ressources sémantiques prédéfinies telles que les ontologies de domaines ou des hiérarchies de concepts [19]. Les centres d intérêts de l utilisateur sont représentés par un réseau de nœuds conceptuels et reliés entre eux en respectant la topologie des liens définis dans les ressources sémantiques. Dans [20], la définition d un profil utilisateur sémantique basée sur l exploitation d une ontologie de domaines ODP (Open Directory Project) pour répertorier le contenu des pages web pour une navigation facile par les utilisateurs et l application d une méthode de propagation des termes et de leurs scores à partir des sous-concepts vers le concept parent. Le tableau suivant décrit une comparaison entre ces différentes approches selon plusieurs critères. TABLE II. COMPARAISON ENTRE LES APPROCHES DE MODELISATION DU PROFIL UTILISATEUR Approches A base de l historique de recherche Ensemblistes Connexionnistes Sémantiques Collecte information Implicite Implicite ou explicite Explicite Explicite Exploitation information Historique de recherche Termes des documents jugés pertinents Relations de corrélation sémantiques entre les centres d intérêts Ontologie de domaine Modèle de représentation Base de données Représentation vectorielle Représentation hiérarchique Ressources sémantiques Limites n est pas efficace lors de la première connexion d utilisateur au système Manque de structuration, de cohérence et de la sémantique la détection d un nouveau besoin en information pour une nouvelle requête utilisateur, une tâche difficile à accomplir. Connaissance préalable des ressources sémantique (ontologie) En générale l approche sémantique représente une meilleure solution pour la modélisation du profil utilisateur, néanmoins, elle porte un inconvénient dans le contexte de recherche sur web où le profil utilisateur est associé avec une masse importante des concepts de l ontologie. Ceci peut augmenter considérablement le temps d exécution des requêtes d une part et la gestion d évolution du profil utilisateur d autre part. Dans le cadre de notre travail, nous proposons un modèle sémantique à base d ontologie pour la construction d'un profil utilisateur et son application dans un système d information d entreprise. IV. MODELISATION ONTOLOGIQUE DES RESSOURCES D UN SIE Nous présentons dans cette section la modélisation à base d ontologie des ressources d un système d information d entreprise. Nous avons distingué deux principales ressources : l utilisateur (Directeur, client, agent, employeur) et les sources d informations (bases de données, fichiers 111 IT4OD 2014

118 Word, fichier Excel, ). Dans ce cadre, nous commençons de modéliser l utilisateur à travers la définition de son profil. A. Modélisation de profil utilisateur Dans les systèmes d information d entreprise, l hétérogénéité entre les différentes sources d information pose des problèmes cruciaux, tels que: la désorientation de l utilisateur, la surcharge informationnelle, l allongement des temps de réponses, l augmentation des coûts d indexation, la diminution de la précision de la recherche, l incompréhensibilité et ambiguïté des informations retournées. La personnalisation vise à intégrer le profil utilisateur dans le processus de recherche. En effet, nous proposons un modèle générique du profil permettant à n importe quel type d utilisateur d entreprise d utiliser ses données d une façon homogène. Nous donnons à l utilisateur de construire son profil selon deux modes différentes : construction manuelle par le remplissage d un formulaire et une construction automatique basée sur l interprétation des activités du client sur les données du résultat. La structure du profil utilisateur est composée de deux catégories d informations: les informations orientées utilisateur et les informations orientées recherche. La première catégorie d information sert à identifier un utilisateur à travers une série d informations qui sont : 1) Identification : contient des attributs qui aident à identifier une personne (nom, genre, données démographique, nationalité, contacts ) 2) Affiliation : inclut la description de l organisation associée à l utilisateur ; 3) Niveau d étude : décrit la liste des diplômes obtenus, la spécialité d étude, le titre de diplôme, mention et l année d obtention ; 4) Activités : contient les activités liées au travail, domaine de travail, poste occupé, grade et lieu de travail ; 5) Compétences : décrit les compétences associées avec la formation de l utilisateur et son expérience de travail. Ces différentes classes de concepts sont liées avec la classe principale «information orientée utilisateur» par des relations d appartenance telles que : has_activity, has_identification, etc. La seconde catégorie d informations orientées recherche qui facilite l'association de la recherche faite par un utilisateur à son profil. Elle décrit les besoins de l utilisateur en information selon les classes suivantes : 1) Centres d intérêts : qui peuvent être fait par des mots clés ou par des requêtes. Les mots clés représentent des mots ou des phrases qui décrivent les informations recherchées. Ils définissent de façon simple le contenu ou le domaine des éléments ciblés. L utilisation de requêtes qui ont pour but de restreindre l espace de recherche et de faire une première sélection d éléments potentiellement intéressants. Cette classe représente le noyau du profil. 2) Contexte : l utilisateur doit décrire le contexte dans lequel il est immergé afin de réduire l espace de recherche et donner le sens correct des termes du centre d intérêt sinon les résultats renvoyés à l utilisateur en présence de son profil risquent d être tous inintéressants. 3) Historique de recherche : est faite d une manière implicite, il représente la trace de l utilisateur par l ensemble des pages web visitées, les liens, les requêtes exécutées. 4) Format de documents retournés : l utilisateur peut préciser le format de document et leur type (texte, son, multimédia, ) ainsi qu on les déduit automatiquement. 5) Sécurité : L utilisateur peut spécifier ses exigences du niveau de sécurité des données personnelles, des ses activités lors d un échange avec l extérieur. La figure suivante représente une partie de l ontologie du profil utilisateur dédié au traitement d hétérogénéité des sources d informations dans des systèmes d information d entreprise SIE. Fig. 1. Ontologie de profil utilisateur (concepts abstraits) sous Protégé IT4OD 2014

119 Le modèle générique du profil est décrit par trois classes (ou concepts) principales : la classe Profil_utilisateur est caractérisée par le nom utilisateur et un mot de passe, la classe Orienté_Utilisateur qui engendre d autre sous classes définissant un utilisateur d entreprise. Orienté_Recherche contenant les informations permettant de manipuler le SIE d une manière personnalisable. A partir de ce modèle générique du profil utilisateur, nous avons développé une plateforme appelée HomeDoc pour la gestion efficace des profils (voire la figure 2). Deux modes de gestion du profil ont été réalisés : gestion manuelle où l'utilisateur doit effectuer des interactions directes avec le système et gestion automatique ; le système s'adapte automatiquement à l'utilisateur et mis à jour son profil par des informations essentiels telles que l historique, les sources préférées et format de documents. 4) La nature : une source d information peut être des réseaux sociaux, fil RSS, site professionnels, encyclopédie. 5) Langue : dans une source d information on peut y avoir plusieurs langues pour utiliser ses données. 6) Mots clés : une liste des mots clés qui reflètent le contenu d une source La modélisation générique d une source d information dans un SIE est représentée graphiquement dans la figure suivante. Fig.3. Représentation des concepts du modèle de source d information Un exemple d une source d information d une entreprise SONATRACH pour la protection des ressources humaines est défini dans le tableau suivant: TABLE III. EXEMPLE D UNE SOURCE D INFORMATION Fig. 2. Plateforme de création d un profil utilisateur Entreprise Société Nationale pour la Recherche, la Production, le Transport, la Transformation et la Commercialisation des Hydrocarbures B. Modélisation des sources d informations hétérogènes La partie cruciale de modélisation des sources d information est la description de contenu des sources d information. Il s agit de décrire les concepts contenant une source d information ainsi que les relations entre ces concepts et les concepts d un profil utilisateur. Nous décrivons une source d information par les concepts suivants: 1) Localisation : définit par l adresse web d une source connue par le sigle URL (Uniform Resource Locator) ou par l emplacement dans la mémoire. On peut trouver aussi le titre d une source comme élément clé pour localiser une telle source. 2) Logiciels : une source d information se manipule à travers un programme informatique tel que SGBD, Navigateur et lie par des protocoles de connexion : JDBC, ODBC, API. 3) Mode d interrogation : chaque source d information possède un mode d interrogation pour accéder à ses données via un langage d interrogation des bases de données : SQL, OQL ou par l usage des moteurs de recherche : Yahoo, Google. Localisation Logiciel Mode d interrogati on Nature Langue Mots clés ravail.pdf Navigateur, Lecteur PDF Moteur de recherche Google Document PDF Français, Anglais et Arabe Santé, travailleur, accident, surveillance médicale Afin de montrer comment relier le modèle du profil utilisateur avec celui de source d information, nous présentons dans le reste de ce papier l architecture générale du système de médiation personnalisé dédié au traitement d hétérogénéité des sources d information dans un SIE. IV. ARCHITECTURE GENERALE Nous proposons une architecture de médiation personnalisée à base de profil utilisateur afin de traiter le problème d hétérogénéité des sources d information dans des grands systèmes d information d entreprise. Le schéma cidessous illustre l architecture générale de SIE. 113 IT4OD 2014

120 Fig. 4. Architecture générale de notre système L architecture générale de notre travail repose sur un système de médiation qui constitue la solution la plus courante pour relier différentes sources de données hétérogènes [22]. L architecture proposée consiste à adapter les sources de données hétérogènes au profil utilisateur avant d intégrer dans le système de médiation. Cette adaptation se déroule en deux phases. La première (1), est une phase de découverte des sources d informations les plus préférées par l utilisateur. Dans cette phase, une communication entre le modèle du profil et le modèle des sources d information est faite par l appariement entre les termes de ces deux ressources. La figure suivante illustre le processus de personnalisation des sources d information au profil utilisateur. Fig. 5. Processus de découverte des sources personnalisable Ce processus vise à extraire les instances de chaque concept du modèle du profil et de projeter sur les instances du modèle des sources afin de filtrer les sources qui correspondent aux préférences d utilisateur. Le résultat de ce processus est un ensemble des sources préférable à l utilisateur. La seconde phase, consiste à utiliser les sources de préférences pour exécuter la requête utilisateur Q sur ces différentes sources en utilisant les adaptateurs de chaque source du système de médiation. L étape 3 mentionnée dans la figure 4 où l adaptation est faite au niveau des résultats de recherche qui doivent être homogènes selon les informations relatives au profil utilisateur. Dans ce cas là, après l exécution d une requête, les formats de réponses serons transformé selon les informations qui se trouvent dans la classe Format de données de la super classe Orienté_recherche du modèle de profil utilisateur. Afin de bien comprendre l intérêt de notre architecture du système de médiation personnalisé, nous donnons un exemple d un SIE pour représenter le déroulement de différents processus utilisant les deux modèles générique de profil utilisateur et de description des sources d information. Un exemple de requête d un chef de service veut récupérer le CV de que chaque employé lient à son service. Dans ce cas il va consulter tous les sources existantes dans l entreprise. Selon notre architecture, chaque usages d entreprise possède un profil et une fois le chef de service se connecte avec le SIE, une liste des sources de préférences serons classées en tête de liste et elles sont intégrées automatiquement dans le système de médiation. Dans ce cas il aura bien sur besoin de récupérer les CV selon la description des sources par le modèle ontologique et la requête du chef de service. Bien sûre certaines informations sont cachées pour les employés et découvertes par d autres selon le niveau de sécurité du profil. L'objectif de la personnalisation du système de médiation sera donc de réduire le nombre de sources en ciblant de façon plus précise, à l aide des modèles du profil et de sources d information. Le problème d hétérogénéité des sources est remédie à travers la personnalisation d accès aux sources d un SIE qui implique l amélioration du temps de réponse. Par conséquent, il est nécessaire de validé notre architecture sur un exemple réel afin d analyser la complexité du calcul et mesurer la pertinence de résultats aux jugements d utilisateur. V. CONCLUSION Ce papier décrit l une architecture d un système de médiation personnalisé, qui permet de sélectionner les sources d information pertinentes au profil utilisateur au ceint d un système d information d entreprise. Ces sources sont localisées selon leur importance par rapport la communication entre les deux modèles générique du profil utilisateur et celui de source d information. Par la suite l adaptation des sources hétérogènes est effectuée à travers un système de médiation. Nous avons présenté une structure étendue du profil utilisateur selon deux catégories d informations : d un côté les informations orientées utilisateurs qui permettent d identifier l utilisateur à travers un ensemble des casses et de l autre côté 114 IT4OD 2014

121 les informations orientées recherche représentées le noyau du profil qui décrit ses centres d intérêt et les caractéristiques de documents retournés. Par conséquent, une expérimentation sur un système de médiation d une entreprise donnée est nécessaire dans les prochains travaux. REFERENCES [1] Tamine-Lechani, L., Boughanem, M., Daoud, M. : Evaluation of contextual information retrieval effectiveness: overview of issues and research. Knowledge and Information Systems, 2010, Vol.24, Issue 1, pp 1-34 [2] Calvier, F.E., Reynaud, C.: Une aide à la découverte de mappings dans SomeRDFS, EGC 2008, Sophia-Antipolis, Revue RNTI-E- 11, Cepadues Edition, 2008, pp [3] Amir, S.: Un système d intégration de métadonnées dédiées au multimédia, Actes du XXVII congrès INFORSID, Toulouse, 2009 [4] Teodoro, D., Pasche, E., Wipfli, R., Gobeill, J., Choquet, R., Daniel, C., Ruch, P., Lovis, C.: a Integration of biomedical data using federated databases, proceedings annual meeting, 2009 [5] Tranier, J., Baraër, R., Bellahsène, Z., Teisseire, M. Where's Charlie: Family-Based Heuristics for Peer-to-Peer Schema Integration. IDEAS, 2004, pp [6] Reynaud, C., Safar, B.: Construction automatique d'adaptateurs guidée par une ontologie, Revue TSI, N spécial Web Sémantique, 2011, Vol. 28, pp [7] Wehrle, P., Tchounikine, A., Miquel, M. :Grid Services for Efficient Decentralized Indexation and Query Execution on Distributed Data Warehouses. 19 th International Conference on Advanced Information Systems Engineering (CAi-SE 07), 2007, pp [8] Ravat, F. et O. Teste. Personalization and OLAP Databases. New Trends in Data Warehousing and Data Analysis, 2008, vol. 3, pp [9] Mansmann, S. et M. H. Schol. Exploring OLAP Aggregates with Hierarchical Visu-alization Techniques. In SAC 07, 2007, pp [10] Reynaud, C., Pernelle, N., Rousset, M.C., Safar, B., Saïs, F.: Data Extraction, Transformation and Integration guided by an Ontology, Chapter 2 of the Book Data Warehousing Design and Advanced Engineering Applications: Methods for Complex Construction, Advances in Data Warehousing and Mining Book Series, Ladjel Bellatreche (Ed.), 2009 [11] A.Bouramoul, M-K. Kholladi et B-L. Doan. Using Context to Improve the Evaluation of Information Retrieval Systems. In International Journal of Database Management Systems (IJDMS), 2011, Vol.3, No.2, pp [12] Tamine-Lechani, L., Boughanem, M., Daoud, M. Evaluation of contextual information retrieval effectiveness: overview of issues and research. Knowledge and Information Systems, 2010, Vol. 24, Issue 1, pp 1-34 [13] Varshavsky, R., Karmon, K., Sitton, D., Lihaiani, L., Heackrman, D.,Davidson, R. content personalization based on user information. Patent application publication, Microsoft corporation, USA, 2010, pp 1 16, 2010 [14] Marie, L., Julien, B., Valentin, B., Philippe, D., Lucie, D., Françoise, G. Personnalisation de l apprentissage : comparaison des besoins et approches à travers l étude de quelques dispositifs. STICEF, 2012, vol 19, ISSN [15] Zemirli, W. N. Modèle d accès personnalisé à l information basé sur les diagrammes d influence intégrant un profil multidimensionnel. Thèse de doctorat, Université Paul Sabatier, Toulouse, France, juin [16] Loeb, Shoshana, Panagos, Euthimios. Information filtering and personalization: Context, serendipity and group profile effects. Consumer Communications and Networking Conference (CCNC), IEEE, 2011, pp [17] Koutrika, G., Ioannidis, Y. A unified user profile framework for query disambiguation and personalization. In Proceedings of Workshop on New Technologies for Personalized Information Access, July [18] Laborie S., Euzenat J., Layaïda N., "Semantic Adaptation of Multimedia Documents," Multimedia Tools and Applications, 2011, vol. 55, no. 3, pp [19] Daoud, M., Tamine, L., Dinh, B.D., Boughanem, M. Contextual Query Classification For Personalizing Informational Search. In Web Information Systems Engineering, kerkennah Island, Sfax, Tunisia, [20] Dromzée,. C., Laborie,. S., Roose,. P. Profil générique sémantique pour d adaptation de documents multimédias. 30ème édition Montpellier de l INFormatique des ORganisations et Systèmes d Information et de Décision, INFORSID, pp.1--16, France, [21] Kostadinov, D., Peralta, V., Soukane, A., Xue, X. Intégration de données hétérogènes basée sur la qualité. Management Science, 2012, Vol. 31, no.2, pp [22] KOSTADINOV, Dimitre, BOUZEGHOUB, Mokrane, et LOPES, Stéphane. Accès personnalisé à des sources de données multiples. Rapport interne, Laboratoire PRiSM, Université de Versailles, France, [23] I.ZAOUI, F.WADJINNY, D.CHIADMI et al. A Personalization Layer for Mediation Systems. Journal of Digital Information Management, 2012, vol. 10, no 1. [24] BENTAYEB, Fadila, BOUSSAID, Omar, FAVRE, Cécile, et al. Personnalisation dans les entrepôts de données: bilan et perspectives. In : EDA pp IT4OD 2014

122 Segment-based Local Density Estimation for VANETs Noureddine Haouari, Samira Moussaoui Faculty of Electronics and Computer Science University of Science and Technology Houari Boumediene BP 32 EL ALIA BAB EZZOUAR Algiers ALGERIA Abstract Local vehicle density estimation is an essential part of many applications in Vehicular Ad-hoc NETworks (VANETs) such as congestion control and traffic state estimation. This estimation is used to get an estimated number of neighbours within the transmission range. Many applications use beacons to estimate this density. However, many studies show that the reception rate could significantly drops at relatively short distances comparing with the transmission range. This is due to the special characteristics of VANETs such as high mobility and high density variation. To enhance the performance of these applications, an accurate estimation of the local density with minimum of overhead is needed. Most of the proposed strategies in literature address the global traffic density estimation without a special attention to the local density estimation. This paper proposes a segment-based approach for local density estimation in VANETs. The simulation results show that our strategy allows interesting estimation accuracy with lower overhead. Index Terms local density estimation, VANETs, ITS I. INTRODUCTION Knowing the number of neighbors in the transmission range is a key parameter of many applications in VANETs such as: congestion control and traffic density estimation. In fact, the information of local density is used to adapt the functioning of these applications to different densities as the vehicular environments characterized by the high density variation. Therefore, the density information helps these applications to react according to the density. The local density estimation in many VANETs applications is based on beacons. However, according to [11], in situations where there is a high message load, the reliable transmission range is reduced by up to 90%. Thus, this degradation causes a very limited awareness on the neighborhood which might trouble the functioning of this kind of applications. For instance, in congestion control protocols, the local density can be used to detect the network congestion by the estimation of the generated load on the control channel by knowing the total number of neighbors and the generated load per vehicle. The estimation error has a co nsiderable impact on the well functioning of such applications. So, these applications need an accurate local density estimation strategy to well perform. Various strategies dealing with density estimation were proposed in literature. Most of them are interested in the global density such as [2][10][5][1]. These methods give less accurate results for small distances. The most closely related approach to our strategy is [8]. A detailed description of this approach as well as a comparison with our approach is provided in Section 3. In [12], the authors propose D-FPAV as a co ngestion control protocol for VANETs. In D-FPAV, they need to find out the total number of all neighbors in the transmission range. For that, every vehicle periodically sends a piggybacked beacon containing its neighbors. When a v ehicle receives these messages, it will be aware of all the vehicles in its transmission range. This information is used later to estimate the load on the channel. But, this approach requires a high overhead to work appropriately. In [8], the authors address the high overhead problem of D- FPAV. They propose DVDE strategy to overcome the overhead generated by the extended beacons. Their approach is based on the segmentation of the transmission range and then instead of sending neighbors in the extended beacons, the vehicles periodically send the density of each segment. To estimate the local density, the vehicle chooses the information of the nearest vehicle to the center of each segment based on the received information. If the segments are not the same, it uses linear interpolation to estimate the density. By gathering this information from different vehicles, the vehicle enhances the accuracy of its estimation. This approach reduces the overhead comparing to the D-FPAV method and it has better accuracy. However, it suffers from some shortcomings. In fact, in DVDE strategy, each vehicle has its own segments (Fig. 1).Because the segments are different in most cases, the authors propose to use linear interpolation to estimate the density of a target segment in this case. This could give less accurate results if the vehicles are not uniformly distributed. Furthermore, the shared information of periodic extended beacons could be useless if many vehicles are in the same area sharing the same information. This periodic redundancy creates an extra overhead that could be avoided if only some selected vehicles share their information. 116 IT4OD 2014

123 Fig. 1 DVDE segmentation strategy Our goal is to provide an accurate local density estimation strategy with the minimum of overhead. We consider the local density as the number of vehicles within the transmission range. The transmission range is the range where the vehicles could receive packets even if they are not able to decode them. In this paper, we present Segment-based Local Density Estimation (SLDE), a local density estimation strategy. The simulation results show that our approach has higher accuracy with lower overhead comparing with DVDE strategy [8]. The paper is organized into four sections as follows. Section 2 introduces SLDE strategy. The simulation and the evaluation of SLDE are presented in Section 3. Section 4 concludes the paper with outlooks on the future work. II. METHODOLOGY In this section, we present SLDE (Segment-based Local Density Estimation). SLDE is designed with the goal of providing an accurate estimated number of neighbors within the transmission range with a minimum amount of overhead. SLDE is designed under the following assumptions: Vehicles are equipped with omnidirectional antennas with the same receiving sensitivity and transmission range. Each vehicle is aware of its velocity and its geographical location through a Global Positioning System (GPS). All vehicles could identify the segment where they are based on preloaded digital maps. The general SLDE algorithm is shown in Algorithm 1. The propagation of the density could be started when a v ehicle passes by the center of a segment or when it receives density information from a vehicle in another segment. In the first case, the propagation of density information started if the last received information about this segment is outdated. This information is provided by the extended beacons that contain the density histogram of the different segments in transmission range of the sender. In the second case, the vehicle before propagating the density information it will wait for a time based on its distance to the center of its segment. So, the nearest vehicle to the center will propagate first the density information. When the other vehicles in that segment receive this information they will stop waiting and cancel their propagation. Before the propagation of density information, the vehicle merges the different received density information. In the following subsections, we will discuss in more detail three main features of SLDE: using fixed segments, data merging and density propagation. A. Using fixed segments The roads are supposed to be s egmented (Fig. 2). Vehicles use preloaded digital maps to determine the identity of each segment. The identification process of segments is out of the scope of this paper. The use of fixed segments allows unifying the segments for all vehicles which makes the using of the density information easier. Moreover, it makes the estimation more accurate because it avoids using linear interpolation. Fig. 2 SLDE segments B. Density merging The density merging is done in extended beacons construction. The merging of the received data is based on choosing the most reliable information of each segment. This is done by using the nearest vehicle with valid data to each segment. This is because the nearer vehicle to the center, the higher probability to detect the vehicles in that segment. In the case of non reception of extended beacons from vehicles nearer than the vehicle itself for a s pecific segment, the vehicle will use the valid received beacons. By the end, the vehicle will have a density histogram that contains the density of different segments. C. Density propagation The density propagation is an important element to have higher accuracy especially for the far segments where the probability of reception is relatively low. For that, in each segment we will have a vehicle that will take the responsibility of merging the received data and its propagation. The most important things in this process is: timely and reliable transmission of density information. In SLDE, there are two ways to start the propagation of density information. The first strategy is when a vehicle passes of the center of a segment, the density information will be propagated if the last received density of this segment is outdated. The information is outdated when it is received before DetlaT period. The other possibility is when a vehicle receives density information. Here, every vehicle receives the density information will wait a period of time before the propagation of the density information where the nearer the vehicle to the center, the lower time to wait. Then, when the vehicles receive the density information from a vehicle of its segment, it stops waiting because there is a v ehicle of their segment takes on the task of propagation. The vehicle will take on the task of the propagation when TimeToWait is consumed without the reception of density information from its segment for at least T period. To make the different propagations more synchronous, the formula 1 is used to 117 IT4OD 2014

124 choose the nearest vehicle to the center by the calculation of time to wait before the propagation of the density information. DDDD, ssssssssssssss TTTTTTTTTTTTTTTTTTTT = LLLLLLLL (1) SSSS/2 Where Dx, segment: the distance between the vehicle and the center of its segment (m) SL: segment length (m) LTEB: Life Time of an extended beacon(s) III. EVALUATION AND SIMULATION In this section, we evaluate the performance of our strategy SLDE comparing with DVDE. A. Simulation Environment The simulation results presented in this paper were obtained using the NS-2 simulator [4]. We choose NS-2 for its credibility among network research community. And for more realistic results we have used an overhauled MAC/PHYmodel [3] adapted to the characteristics of IEEE P (the standard of the inter-vehicle communications). The used simulation parameters are illustrated in Table 1. Due to the limitations of deterministic radio models simulations, we have used the probabilistic Nakagami propagation model [9]. Our simulation mobility scenario consists of a bidirectional highway of 2 km of length with 3 lanes per driving direction. The highway is straight and without entrances or exits. SUMO[6] was used to generate the movement pattern of vehicles. We use this tool because it is open source, highly portable and can be used to simulate both the microscopic and macroscopic environments. The used mobility scenarios reflect different levels of service on highways. For each level of service we have used the highest possible density in it based on the highway capacity manual [7]. The used densities are 7, 11, 16, 22 and 25 v/km/l (vehicles per kilometer in each lane). We have analyzed different values of the segment size and T to determine the most appropriate ones to have a high precision with minimum of overhead. The results are illustrated on Table 2. The best value for the mean error of the three scenarios is 5% with overhead of bytes/second by using 20m as segment size and 0.9 for T. And if we accept an error of 12% we will have just bytes/second by choosing 100m as a segment size and 0.5 for T. Therefore, these two cases were chosen for further simulations. We have analyzed also the different values of the segment size for DVDE strategy. We have found the lowest error ration in DVDE is 0.2 with a mean overhead of bytes/second by using 20m as segment size. Also, by using 250m as segment size, we have found about 0.21 as error ration with less overhead bytes/second. These two cases are chosen for further simulations to compare with our strategy SLDE. (a) TABLE I SIMULATION CONFIGURATION Parameter Frequency Data rate Carrier Sense Threshold Noise floor SINR for preamble capture SINR for frame body capture Slot time SIFS time Preamble length PLCP header length Value 5.9 GHz 6 Mbps -96 dbm -99 dbm 4 db 10 db 16 us 32 us 40 us 8 us Medium access and physical layer configuration parameters for IEEE p. Parameter Value MAC P Beacon generation Beacon lifetime Extended beacons lifetime Packet size Maximum vehicle velocity Transmission Range Road length Radio propagation 10 beacons/s 0.1 second 0.5 second 400 byte 30 m/s 500 m 2 km Nakagami Number of vehicles 84, 132, 192, 256, 300 (b) Simulation setup. 118 IT4OD 2014

125 TABLE 2 ANALYZING CONFIGURATION PARAMETERS FOR SLDE. Segment size(m) T(s) Error(%) Overhead (Bytes/second) EEEE RRRR EEEEEEEEEEEEEEEEEEEE = 100% (2) RRRR Where EN: the estimated number of the neighbors in the transmission range. RN: the real number of neighbors in the transmission range. C. Overhead evaluation The overhead is computed as mean number of sent extended beacons per second multiplied by the size of each packet which is dependent of the number of segments. The results found are shown in Fig B. Performance Metrics Two performance metrics were used for the evaluation of the performance of SLDE: Overhead: it measures the generated overhead per second by all vehicles (Bytes/second). Error ratio: It is calculated using the formula 2. Fig. 3 Overhead evaluation. We observe in Fig. 3 that the overhead increases with the density of vehicles. This is expected because increasing the number of vehicles increases the number of sent extended beacons in both strategies. However, the overhead generated by SLDE is far less than the overhead generated by DVDE. This difference is the direct impact of the adequate using of density information and the used propagation strategy. Indeed, the difference among both curves is already significant at low densities especially for SLDE(100m, 0.5s). Moreover, in this case, the overhead generated by DVDE could reach about eight times of the overhead generated by SLDE in the case of 25 v/km/l. Furthermore, with the same density, the overhead generated by DVDE is about twice the overhead generated by SLDE(20m, 0.9s). Indeed, DVDE has more overhead because that each vehicle is supposed to sent periodically the density information, so as the number of vehicles increases the overhead increases. In the case of SLDE, the overhead is not increasing as speed as DVDE which makes our strategy more scalable than DVDE strategy especially in the case of SLDE(100m, 0.5s). D. Accuracy evaluation Fig. 4 depicts the error ratio by each strategy as a function of the vehicular density. As it can be observed, SLDE has less error ration with the two versions comparing with DVDE strategy. The reduced error ration for SLDE is due to the using of fresh density histograms using our propagation strategy. 119 IT4OD 2014

126 Fig. 4 Error ration evaluation. IV. CONCLUSION Several applications in VANETs use the local density estimation to be adaptable to the different density scenarios. Their performance is directly related to the accuracy of this estimation. In this paper, we have proposed a s egmented-based local density estimation strategy (SLDE). The simulation results show that the proposed strategy (SLDE) has many advantages. It has less overhead comparing with DVDE strategy. Moreover, it has higher accuracy. Therefore, by using SLDE, the performance of many VANETs applications could be improved. As a future study we intend to address the unified segments and situations where not all the vehicles have the digital maps. Also, we will address using SLDE as density estimation strategy for a congestion control protocol. REFERENCES [1] Nabeel Akhtar, Sinem Coleri Ergen, O znur O zkasap. Analysis of distributed algorithms for density estimation in VANETs (Poster). VNC: , [2] Maen Artimy. Local density estimation and dynamic transmissionrange assignment in vehicular ad hoc networks. IEEE Transactions on Intelligent Transportation Systems, 8(3): , [3] Qi Chen, Felix Schmidt-Eisenlohr, Daniel Jiang, Marc Torrent-Moreno, Luca Delgrossi, Hannes Hartenstein. Overhaul of ieee modeling and simulation in ns-2. Proceedings of the 10th ACM Symposium on Modeling, analysis, and simulation of wireless and mobile systems: , [4] Kevin Fall, Kannan Varadhan. The network simulator (ns-2). URL: isi. edu/nsnam/ns, [5] Laura Garelli, Claudio Casetti, C Chiasserini, Marco Fiore. Mobsampling: V2V communications for traffic density estimation. Vehicular Technology Conference (VTC Spring), 2011 IEEE 73rd:1 5, [6] Daniel Krajzewicz, Georg Hertkorn, C Ro ssel, P Wagner. Sumo (simulation of urban mobility). Proc. of the 4th Middle East Symposium on Simulation and Modelling: , [7] Highway Capacity Manual. Highway capacity manual. Washington, DC, [8] Jens Mittag, Felix Schmidt -Eisenlohr, Moritz Killat, Jéro me Ha rri, Hannes Hartenstein. Analysis and design of effective and low-overhead transmission power control for VANETs.Proceedings of the fifth ACM international workshop on VehiculAr Inter-NETworking:39 48, [9] Minoru Nakagami. The m-distribution-a general formula of intensity distribution of rapid fading. Statistical Method of Radio Propagation, [10] Sooksan Panichpapiboon, Wasan Pattara-atikom. Evaluation of a neighbor-based vehicle density estimation scheme ITST th International Conference on ITS Telecommunications: , [11] Robert Karl Schmidt, Thomas Ko llmer, Tim Leinmu ller, Bert Bo ddeker, Gu nter Scha fer. Degradation of Transmission Range in VANETS caused by Interference. PIK-Praxis der Informationsverarbeitung und Kommunikation, 32(4): , [12] Marc Torrent-Moreno, Jens Mittag, Paolo Santi, Hannes Hartenstein. Vehicle-to-vehicle communication: fair transmit power control for safety-critical information. IEEE Transactions on Vehicular Technology, 58(7): , IT4OD 2014

127 CCBRP: A protocol based QoS for mobile multimedia sensor networks Merzoug S. 1, Derdour M. 1, Gharzouli M. 2 1 LRS, Université of Tebessa Mathematics and computer sciences department 2 Faculty of NTIC, University of Constantine 2 Abstract Abstract. The various recent applications for wireless multimedia sensor networks (WMSNs) evolve technological advances in micro electromechanical systems. They promote the development of a powerful class of sensor-based distributed intelligent systems, that capable of ubiquitously retrieving multimedia information. In this context, there are a large number of audio and video floods, that are treated and transmitted to the sink (sink) for more precise reactions and decisions, which are founded on the results collected by the sensors. These last justify an important need in terms of bandwidth and transmission delay and energy management. In this area, it exists a very large number of QoSbased protocols for WMSNs, but they do not take into account the generation of large-scale multimedia data. In this paper, we analyze the most representative QoS-based protocols in order to understand the basic principle of the design, which is to maximize the QoS. Also, we present a protocol for WMSNs based on QoS and their hybrid architecture based on two approaches: Cluster - based approach and Chain-based approach. Index Terms: WMSN; QoS; QoS-based routing; routing multipath; TDMA; Cluster; Chain. 1. NTRODUCTION The availability of equipments with low costs, such as cameras CMOS and microphones, supported the development of Wireless Multimedia Sensor Networks (WMSNs) of wirelessly interconnected devices that are able to ubiquitously retrieve multimedia content (video and audio streams, still images, and scalar sensor data from the environment). Then, WMSNs are needed to reinforce the capacities of the current wireless sensor networks, and to improve and develop of new applications, like the multimedia surveillance sensor networks. WMSN introduce several new research challenges, mainly related to the mechanisms that provide a quality of service at the application level (like the latency reduction). In this paper, we present a new approach with a hybrid protocol, architecture of operation and some basic new concepts. Also, we explain the manner to combine the approach of communication with chain and the approach of communication to cluster. More precisely, this paper presents a hierarchical routing protocol which is effective for the management of bandwidth, transmission delay and energy related to the WMSNs. This protocol adopts an architecture model based on two routing approach: cluster and Chain to estimate the available bandwidth and produce routing paths. 2. RELATED WORKS Availability of multimedia devices, with miniature size and low cost, has led to considerable growth in the development and deployment of WMSNs. It is desirable to extract more realistic and accurate information from the rapidly changing events in several areas such as military, emergency and rescue, surveillance and security, health, traffic, industry, environment and monitoring equipment [1]. Because of this growth potential use of WMSNs, much research is focused on providing quality of services. SPEED [2] is a spatiotemporal, priority-based, QoSaware routing protocol for WSNs that supports real time traffic with delay requirements and maintains a desired delivery speed across of the network. Multipath Multi-Speed (MMSPEED) [3] protocol is an integration of reinforcement learning based probabilistic multipath forwarding with soft real-time guarantee of SPEED. This last supports timeliness by ensuring bandwidth and real-time transmission constraints using SPEED and enhances data transmission reliability by probabilistic multipath forwarding. Hamid et al. [4] propose a multi-path and multi-channel based QoS-aware protocol. The routing decision is made according to the dynamic adjustment of the required bandwidth and path-length-based proportional delay differentiation for real-time data. 121 IT4OD 2014

128 Power Efficient Multimedia Routing (PEMuR) [5] is an efficient protocol for video communication over WMSNs. It ensures low power consumption over all sensor nodes and high perceived video QoS by making a combined use of hierarchical routing and video packet scheduling models. Energy-Efficient and QoS-based Multipath Routing Protocol (EQSR) [6] applies a service differentiation technique. It is designed to satisfy the reliability and delay requirements of real-time applications. Keeping in view the limitations of table-based real-time routing techniques in WMSNs. Xue et al. propose a service differentiated realtime communication scheme (SDRCS) [7] to provide soft real-time guarantees for event-based traffic in WSNs. SDRCS uses grouping approach to estimate end-to-end hop distance and to meet various application requirements. W. Sun et al. propose a new routing metric called Load Balanced Airtime with Energy Awareness (LBA-EA) [8]. It addresses load balance and energy awareness in WMSNs. The metric selects the less congested route and balances traffic loads among nodes by taking in to account the traffic load of forwarding nodes as well as the inter-flow interference caused by contending neighbors. A comparative analysis of these protocols is given in Table 1. Table 1: Comparative study of WMSN routing protocols Based on above survey is observed that the routing protocols that are adaptive in nature are desirable for WMSNs because of their properties and key constraints. In case of multimedia sensor networks, the protocol should provide QoS support along with energy efficiency. Moreover, if the network consists of heterogeneous nodes with varied applications then the protocol should be intelligent enough to identify various traffic types and their respective QoS requirements. Therefore, to meet these diverse requirements along with simplicity, it is desirable to have a multi-hop communication based protocol with a QoS support to select the best routes. 3. Modern approaches to definition infrastructure of sensor networks After studying the types of approach to routing in WMSNs, we can inferred that the cluster approach and the chain approach are most appropriate to WMSNs in terms of prolongation of time and lifetime of the network as well as the effective management of energy consumption Cluster approach The sensor nodes are grouped into clusters controlled by a BS. Each cluster has a cluster-head (CH) node that aggregates all data sent to it by all its members and transmits data to the remote BS. Therefore, the CH node must have much more energy than the non-ch node (sensor node, relaying node). BS performs cluster formation in the network, and informs all sensor nodes of clustering information afterwards [9] Chain approach Nodes will be organized to form a chain can be either calculated by BS, so that they need to communicate only with their closest neighbors. All nodes way or completed by the sensor nodes themselves using a greedy algorithm. The aggregated form of the data will be sent to the base station by any node in the chain and the nodes in the chain will take turns in sending to the base-station [10]. The cluster approach (clustering) proposed for the first time in the LEACH algorithm [11], which showed its effectiveness by comparing with other approaches (centered data, geographical) in terms of consumption and uniform dissipation of energy thus extending the life of the network, a number of more or less apparent disadvantages. We mention among them: The most distant nodes to CH nodes die quickly compared to those that are closer. The number of messages received by the CHs amounts to approximately that of the managed nodes. This leads to the rapid depletion of their energy reserves. The use of singel-hop instead of multi-hop (multihop) rapidly depletes the energy of nodes and consumes bandwidth advantage. 122 IT4OD 2014

129 The approach with chains (chain-based approach) proposed for the first time in the algorithm of PEGASIS [12] prevents the formation of clusters and organizes network nodes form a chain where one node is selected to transmit to the BS. This reduces energy necessary to the transmission of the data by cycle since the dissipation of energy is distributed uniformly above all the nodes. The distances that most nodes transmit are much less reduced compared to the transmission with the leader (CHs) in LEACH. The idea of combination was proposed by T. Thua Huynh and C. Hong Seon in [13] at the cluster heads where a chain of nearest neighbor is built from the CHs of each cluster to ensure, that the CHs farthest from the BS, do not die quickly any more.. After analyzing the two approaches (cluster and chain) and the idea of combining the two approaches, we found that we can improve the algorithm chains and its combination with the cluster algorithm. The idea is to build chains with started from all the possible chains (better than to build only one chains on the level of each cluster), which leads us to propose a new QoS-based hybrid protocol combines the advantages of two major approaches (Cluster based approach and Chain-based approach). 4. Contribution Our protocol is hybrid and QoS- based that combines cluster approach and the chains in order to preserve the effectiveness of treatment of multimedia traffic. dissipation of energy, which makes it possible to reduce the load on CH (cluster-head). Indeed, nodes communicate only with their immediate neighbors and not directly with their CH. Secondly, the approach with chains where the nodes will be organized of such kind so that they form several chains will thus need to communicate with only their closest neighbors and take turns in the communication with the base station. Figure 1 shows how the nodes will be organized inside the clusters. The N1 node transmits its data to its nearest neighbor N2, which as for him incorporates the data received with his neighbor and transmits them to its other neighbor until reaching CH which transmits them directly to the BS to preserve energy. So, in this new organization (cluster chains), all cluster nodes will transmit their collared respective CHs by connecting through the chain data, while each CH receives the collected data nodes persist chain. This algorithm describes the policy how nodes choose the route to the cluster head Start 1: Long (P); # returns a number of jumps (nodes) in the path P 2: LP (Ni); # returns a list of nodes belonging to the other path 3: select path (P); procedure to select transmits path 4: Nbp (Ni); # returns a number of path in the node 5: if (Ni transmits data to CH) 6: while (Nbp (Ni) =! 0) faire 7: if (long (p) < long (p ) and LP (Ni) < LP (Ni)) 8: select path (p); 9: end if 10: end while 11: end if The number of nodes that communicate with the CH is considerably reduced. This result implies a better energy savings and extends the benefit of life CHs. Because if they die (deplete energy reserves), all nodes in the cluster will lose their power to communicate with the BS and therefore the entire cluster is considered invalid (does not communicate with the BS). In our protocol, we decided to change the CH according to the energy level to consider the case where the CH dies. Figure 1. Basic Ideas of Organization nodes in the network Two key points (the approach cluster and approaches chains) are considered in our design. Initially, the approach cluster allows the organization of the nodes belonging to the same Cluster in the form of a chain improving In our protocol, we adopted the concept of TDMA [14], CHs establish a transmission plan (TDMA) that attribute to each node hang the exact time, which he must transmit its collected data. This allows nodes to turn off their radio antennas and switch to the sleeping state, which will allow you to save more energy. In addition, the use of TDMA will enable us to avoid the problems of collusion and interference between nodes in the cluster. 123 IT4OD 2014

130 4.1. Principe of operation The formation of cluster and chains can be performed in a centralized manner by the base station. To get better results in terms of distribution equalizes nodes between the clusters. Step 1: formation of clusters 1 : Nbr_cluster= read 2 : 360 %Nbr_cluster=??? minimize the energy expended. Once the clusters are formed, the base station will switch to the elision of CHs. These are chosen in a very simple manner where only the node, that has the largest energy reserved, is eligible to become the next cluster head. We can improve the choice of CH by adding additional selection criteria such as the position of CH compared to the heads of chains, i.e. to choose as CH, the node, which has the position nearest practically to all the nodes from it, will receives the data. BS sends Msg_BS_start Every node listen mode If Msg_BS_start received Figure 2. Started the Function of balancing of numbers of nodes in the clusters. Start Initialization/ reinitialization the infrastructure BS broadcasts a message requesting the information nodes; each node sends a data packet to the BS containing the energy reserves and their location using a GPS system. Yes Start the Transmission If the phase Transmission end No En Msg_BS Msg_Node Ni Node Ni BS (Sink) Figure 5. The Organizational of the operation of our protocol Regarding the formation of channels, an idea is adopted, the nodes in the same cluster form a chain based on the closest neighbors. Each node receives data from one of its neighbors, fuses (aggregated) data of the latter with their data and sending to turn to its other neighbor in the chain. Figure 3. Balancing function Msg_ organization Figure 4. Election of CHs Figure 6. Basic Idea formations chains The method of formation (to cluster) determines, from the exact position of the nodes, the optimal configuration to 124 IT4OD 2014

131 Figure 7. Organizational Construction chains Indeed, the proposed algorithm starts with the further node to ensure that the more CH nodes have near neighbors. The neighboring distances increase gradually as nodes already in the Chain can t be revised. Figure 6 shows the N4 connecting node to node N3, which connecting node to node N2 and this last connecting to node N1, in that order until reaching the CH node. This algorithm describes the policy of building chains. Start Nbr_ nœud: define the number of nodes in the cluster; Get_ nœud_far (Ni); # Research node farthest from CH; Msg_Explore_Neighbor: search for neighboring nodes; Msg_Neighbor: procedure returns the set of neighbors of node farthest from node CH {N1, N2, N3, Nm.}; m ;# m is the identifier of node N; Nœud_chain; # chain construct; Head_chain; #the head of the chain; Second_ nœud: # procedure returns the node that adds from all neighboring Nm Add_ nœud; # procedure to add nodes in the chain Supp_ nœud; #procedure qui soustraite de l ensemble des noeuds appartenant au cluster 0: Nbr_ nœud =N; 1: Nœud_chain= " "; 2: Head_chain Get_ nœud_loin (Ni) ; 3: N=N- Head_chain; Start The head of the chain is initialized with the farthest away CH node The new head of the chain is subtracted from the set of nodes belonging to the cluster All nodes in the cluster Φ The head of the chain is initialized with the closest node of the former head 4: Head_chaîne (Msg_Explore_Neighbor); 5: router (Msg_Neighbor (N1..Nm)); End 6: while (n! =0) Faire 7: Head_chain (Second_ nœud); 8: Add_ nœud (Nœud_chain, Head_chain); 9: Supp_ nœud (N, Head_chain); 10: End While 11: End 4.2.Principe of Routing The unfolding of our hybrid protocol is east unscrews in several cycles of execution. Each cycle starts with a phase of initialization in which the clusters with multi chains are formed and CHs are elected, followed by a phase of transmission where the collected data are transmitted through the chains to CHs, which will transmit in their turn to the base station. The nodes all must be synchronized in order to take part in the phase of initialization at the same time. In order to minimize the problems of interference and the time of transmission, the duration of the phase of initialization is fixed in order to be much smaller compared to the phase of transmission Step of initialization and re-initialization The initialization step begins with the creation of clusters. The base station uses the proposed algorithm to form clusters. This method provides a better result in terms of formation cluster and energy conservation. After the formation of clusters, CHs are selected in a simplified manner where only the node that has the largest reserve of energy and closest to the BS, among nodes in the same cluster. Next, the construction of multi-chains is addressed by applying the algorithm of forming multi-chains, where the CH sends the information of each chain to their BS. We adopt the technique of time division multiplexing as a means of access to the medium. This technique involves building a TDMA table to share transmission time on the node. Since each node knows in advance the time slot that will occupy, this allows the node to move to the «Asleep» state during inactive slot. If the step of transmitting a node fails and we find a problem at routing data to the BS, our protocol should make a return to the re-initialization step for maintaining the topology and after resumes transmission Transmission Step The step of transmission is divided in several iterations where the nodes transmit their collared data go through a chain to the CHs. In each iteration, a node transmits at least one data packet during its time slot previously allocated by the base station. Knowing that the time slot allocated to each node is constant, the time for each iteration of 125 IT4OD 2014

132 transmission will depend on the number of existing nodes in the cluster chain. This algorithm describes the policy of routing: START Reception _of _a _Packet (Msg_BS_start) Pk= Msg_BS_start; Select_ch() ; # procedure identifier node CH Tr_Data ;# procedure transmission Data Re_Data ;# procedure for receiving data Re_Ch_Data ;# procedure for receiving data via CH Sleep_Ni; # procedure go to the sleeping state if (pk!=nulle) 1: Nbr_cluster= read; start Initialization 2: 360 %Nbr_cluster=??? # formation of clusters 3: Apply the balancing function of the nodes in the cluster... 4: applies the algorithm of construction of multi chains 5: Select_ch(Ni) ; 6: Tr_Data (Ni) ; 7: if (Tr_Data (Ni)=1 True) 8 :{ Re_Data (Ni) ;} Start Transmission 9: else {apply the algorithm of construction of multi chains #start reinitialization step for maintaining the topology 10: Go to (instruction 6)} 11: if(ni==ch) 12: {Re_Ch_Data (Ni) 13: Tr_Data (Ni); 5. Discussion The approach of communication cluster-based was proposed for the first time in LEACH protocol. However, this approach doesn t answer several challenges on the levels of the clusters. For example the energy consumed for the communication of nodes with the CH node is much higher compared to our protocol, which adapts an idea of PEGASIS protocol or the approach of communication to chain on the levels of cluster in way created several chains of nodes for the sending of the data. Another major disadvantage is the case if the algorithm of construction of the chain (PEGASIS) does not find all the combinations possible of the chain. The idea of combining the two approaches was proposed by T.Thua and C.Seon inherited the disadvantage of the approach with chain, but it eliminated the disadvantage of the cluster approach. Our hybrid protocol based on the organization of the nodes in cluster in the form of chains, the construction of the chains in order to find all the possible combinations, which ensures the routing multi-chains on, this makes it possible to improve quality of service (QoS) in WMSNs and to take into account the multimedia traffic. Protocole LEACH Motivation Efficient use of resources Routing type Routing mono path PEGASIS Fault Tolerance Routing mono path T. Thua et C. Seon CCBRP Reliable data transmission Fault tolerance, reliable data transmission Routing mono path Routing Multi path Performance parameter Lifetime to network,transmission delay Delivery rate of data Reliability, network lifetime Transmission delay, packet loss rate, maintenance of roads, life network Table 2: Comparative study of the protocols of routing relative to our protocol CCBRP The important points to develop our protocol are: Adopt a TDMA mechanism to improve and adjust the energy consumption Reduce the load on the node CH (communication nodes with their neighbors do not pass through the CH) Establish a mechanism for better use of the bandwidth drain Minimize the transmission delay end to end delay and error and failure rates. 6. Conclusion We are witnessing an evolution in the design of protocols with QoS passing the concern of the quality of service to the energy economy. The main objective of this paper is to present the main protocols with QoS to provide a certain quality of service while minimizing energy consumption. Consequently, our paper presents a protocol for managing QoS and energy consumption. Combining the idea of cluster and chain reduces the problem of intracluster interference and the problem of the burden of CH and advantage our approach minimizes the delay and loss rate. Finally, the notion of quality of service used in our paper includes traditional network parameters such as: high speed, low delay, low loss, and low energy consumption. We're not talking about the guarantee of quality of service, whether deterministic or probabilistic. This last point requires further research we plan in the future. 126 IT4OD 2014

133 7. References [1] Ian F. Akyildiz, Tommaso Melodia, Kaushik R. Chowdhury. A survey on wireless multimedia sensor networks. Computer Networks (Elsevier) [2] Kemal A, Mohamed Y A survey on routing protocols for wireless sensor networks MD 21250, USA,spet 2005 [3] E. Felemban, C. Lee, E. Ekici. MMSPEED: Multipath multi- SPEED protocol for QoS guarantee of reliability and timeliness in wireless sensor networks. IEEE Trans. Mobile Computing. Vol. 5, no. 6, pp , [4] M. Hamid, M. Alam, H. C. Seon. Design of a QoS-Aware Routing Mechanism for Wireless Multimedia Sensor Networks. Proc. IEEE Global Telecommunications Conference. Pp , [5] J. M. Kim, H. S. Seo, J. Kwak. Routing Protocol for Heterogeneous Hierarchical Wireless Multimedia Sensor Networks. Wireless Personal Communications. Vol. 60, no. 3, pp , October [6] T. He, J.A. Stankovic, C. Lu, T.F. Abdelzaher. A spatiotemporal communication protocol for wireless sensor networks. IEEE Trans. Parallel Distributed Systems. Vol. 16(10), pp , [7]Y.Xue B.Ramamurthy,M.C.Vuran SDRCS:A servicedifferntiated real-time communication schme for event sening in networks Vol55pp [8]W.sun M.chen A Load-Balanced and Energy-Awer routing Metric for wireless Multimedia Sensor Networks. CWMMN, [9] Long C, Sajal K, Mario D, Canfeng C, Jian M Streaming Data Delivery in Multi-hop Cluster-based Wireless Sensor Networks with Mobile Sinks IEEE COMMUNICATIONS, [10] K. Du, J. Wu and D. Zhou, Chain-based protocols for data broadcasting and gathering in sensor networks, International Parallel and Distributed Processing Symposium, [11] M. Younis, M. Youssef and K. Arisha, Energy-aware Routing in Cluster-Based Sensor Networks, the Proceedings of the 10th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, [12] S. Lindsey, C. Raghavendra, PEGASIS: Power-Efficient Gathering in Sensor Information Systems", IEEE Aerospace Conference Proceedings, Vol. 3, 9-16 pp , [13] T. Thua Huynh et C. Seon Hong An Energy*Delay Efficient Routing Scheme for Wireless Sensor Networks, MMNS 2005, LNCS 3754, pp , IFIP International Federation for Information Processing, 2005 [14] Ye-Qiong Song»Réseaux de Capteurs Sans Fil : Comment Fournir La Qualité de Service Tout En Economisant l Energie», [15]. Sohrabi K, Gao J, Ailawadhi V, Pottie GJ. Protocols for selforganization of a wireless sensor network. Ieee Personal Communications, 2000,7 (5): [16] F. Ye, et al., A Scalable Solution to Minimum Cost Forwarding in Large Scale Sensor Networks, Proceedings of International Conference on Computer Communications and Networks (ICCCN), Dallas, TX, October 2001, pp [17] K. Akkaya and M. Younis, An Energy-Aware Qos Routing Protocol for Wireless Sensor Networks, Proceedings of the IEEE Workshop on Mobile and Wireless Net works (MWN 2003), Providence, RI, May [18]. Huang XX, Fang YG. Multiconstrained QoS multipath routing in wireless sensor networks. Wireless Networks, 2008,14 (4): [19] H. W. Tsai, et al., Mobile Object Tracking in Wireless Sensor Networks, Journal of Computer Communications, Vol. 30, No. 8, March [20]. Peng J, Chengdong W, Yunzhou Z, Zixi J. DAST: a QoSaware routing protocol for wireless sensor networks International Conference on Embedded Software and Systems Symposia (ICESS Symposia), [21]. Rahman MA, GhasemAghaei R, El Saddik A, Gueaieb W. M-IAR: biologically inspired routing protocol for wireless multimedia sensor networks IEEE Instrumentation and Measurement Technology Conference (IMTC '08), [22]. Razzaque A, Alam MM, Mamun Or R, Choong Seon H. Multi-constrained QoS geographic routing for heterogeneous traffic in sensor networks. IEICE Transactions on Communications, [23]. Chenyang L, Blum BM, Abdelzaher TF, Stankovic JA, Tian H. RAP: a real-time communication architecture for largescale wireless sensor networks. In: Proceedings Eighth IEEE Real-Time and Embedded Technology and Applications Symposium. 2002, [24]. Ratnaraj S, Jagannathan S, Rao V. OEDSR: Optimized energy-delay sub-network routing in wireless sensor network. In: Proceedings of the 2006 IEEE International Conference on Networking, Sensing and Control. 2006, [25]. Shanghong P, Yang SX, Gregori S, Fengchun T. An adaptive QoS and energy-aware routing algorithm for wireless sensor networks International Conference on Information and Automation(ICIA), [26]. Deb B, Bhatnagar S, Nath B. Reliable information forwarding using multiple paths in sensor networks. In: Local Computer Networks, 2003 LCN '03 Proceedings 28th Annual IEEE International Conference. 2003, [27] MQoSR: A Multiobjective QoS Routing Protocol for Wireless Sensor Networks Hind Alwan and Anjali Agarwal Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada H3G 1M8 Hindawi Publishing Corporation ISRN Sensor Networks Volume 2013, Article ID , 12 pages [28] QGRP: A Novel QoS-Geographic Routing Protocol for Multimedia Wireless Sensor Networks. Mohammed-Amine KOULALI1, Mohammed EL KOUTBI1, Abdellatif KOBBANE1 IJCSI and Mostafa AZIZI2 International Journal of Computer Science Issues, Vol. 8, Issue 6, No 2, November 2011 ISSN (Online): [29] J. N. Al-Karaki, A. E. Kamal, Routing Techniques in Wireless Sensor Networks: A Survey, IEEE Wireless Communications, Vol. 11, No. 6, Dec. 2004, pp [30] QoS aware Routing for End-to-End Reliability of Delay Sensitive Events in Wireless Sensor Networks R.SUMATHI R.SRINIVASAN 127 IT4OD 2014

134 Bio-inspired Routing Protocol, Oriented Maintaining of Connectivity by Mobility Control BAHLOUL NOURELHOUDA* ABDESSEMED MOHAMED RIDA ZIDANI ABDELMAJID Department of Computer Science Department of Computer Science Department of Computer Science University of Batna University of Batna University of Batna BATNA, ALGERIA BATNA, ALGERIA BATNA, ALGERIA Abstract The interest in systems of autonomous mobile robots has increased with the advent of wireless technology which makes their cooperation more effective to accomplish the tasks that they are responsible of. To do this, each node-robot navigates autonomously while remaining connected to the other nodes via a wireless medium. In the case where a transmission has already started, this connectivity can be maintained deliberately by constraining the robots movements that it supports. In the other cases, their movements depend on the desired task. In the absence of centralized communication infrastructure, these robots support exchanges of messages through a MANET where they represent the nodes. Based on the operating principle of the AODV, we propose here the BR-AODV routing protocol based on the rules of keeping formation in the Boids of Reynolds; it allows the maximum conservation of active routes. In many real applications, the communication has priority over the required task. In such circumstances, BR-AODV shows its supremacy over the conventional protocols. To validate this, we compare it with the AODV (one of the protocols of reference in the MANETs). The simulation results are provided and discussed. Keywords Boids of Reynolds, Emergence, Maintaining of connectivity, Mobility control, Network of mobile autonomous robots. I. INTRODUCTION Nature has always been a source of inspiration to insufflate the routing solutions for researchers and engineers, which are increasingly sophisticated. The reason is that biological systems are characterized by the pooling of a large number of autonomous entities, interacting locally between them in order to self-organize and coordinate their actions. Thus to cooperate collectively to succeed in the given mission, resist to the internal/external disturbances and to adapt with the detected variations. Due to their modular and fully distributed design emerges, at high level of these systems, the behaviors which are incredibly sophisticated. These distinctions answer to most characteristics of the routing protocols, necessary currently, and those wished in the future. One of the most carrying crenels in this perspective is the insect societies. Indeed, swarm intelligence [3] was a source of inspiration to build many routing protocols ensuring autonomy, distribution, adaptability, robustness and scalability in the wireless networks. This type of protocol will prepare the advent for those of the next generation. They will be much more intelligent to be able to manage routing in networks with a high degree of autonomy [19][26]. The Artificial intelligence is the field of research which includes all that relates to the understanding and reproducing intelligence under all its forms. First, this discipline has focused on the human being, trying to mimic his reasoning [3]. Unfortunately, the obtained results were much more humble than those expected. In nature, there are many other intelligence forms, much simpler [3]. Swarm intelligence (SI) via the collective behavior achieved by the schools of fish and flocks of birds, represents a concrete example [2][3]. Based on most of these collective behaviors, we find a social aspect of formation that promotes interactions and exchanges of information between individuals of the same group [11]. One of the most fruitful application areas of SI is the swarm robotics (SR). When the robots are connected, this gives more credibility and efficiency to their collective behaviors. We speak then about a robots network organized as a swarm and based on the self-organization. Thus to make emerge behaviors preserving a biological reality via some properties, such as: robustness, scalability, and flexibility [25]. The swarm of connected robots is often described as a complex adaptive system (CAS) where the concepts of emergence and self-organization are used conjointly. It consists of a set of autonomous mobile particles-robot with a rudimentary constitution. Each particle-robot is equipped with sensors, actuators, and it is aware of its local environment [13] 128 IT4OD 2014

135 [20]. This type of network accumulates already elements of intelligence and it is eligible to join the class of intelligent networks [26] which know how to adapt dynamically to external and internal changes. In addition, the self-organizing capacity is supposed to generate all self-* properties (as selfconfiguration, self-optimization, self-repairing, self-protection and self-explanatory) [25]. The cooperation allows to a swarm of robots to ensure effectively the required tasks (such as exploration, exploitation, monitoring and environmental mapping), while wireless communication, that supports cooperation, it enables him to achieve a high degree of flexibility and autonomy [8] [9] [16] [20]. Because of this wireless communication, the robots swarm succeeds to self-organize in network without any centralized administration, once it is put in place [14][15]. Among the most wanted aspects of the latter we can cite selfdeployment [4][5] and self-configuration [6] depending from the situations where the swarm of robots can be found. Such a network can be a MANET (Mobile Ad hoc Network). A MANET can be a network which is used by robots to communicate [15][20], as it can be supported by these robots [16][18]. In a MANET, the data transmission services adapt to changes caused by the autonomous movements of nodes [4] [5][6]. Because of this assumption of independent mobility, and because of the absence of centralized administration, a network of this type can be often partitioned [4]. This situation is not desired in many multi-robot applications, where the existence of uninterrupted communication channels between the robots involved in a given exchange is required [10]. We are, then, faced to the following challenge: how to coordinate the movements of robots so that the roads connectivity is not compromised? This problem is already the subject of much research in the field of robots networks. In the first works integrating the multi-robots systems and the ad hoc network, the solutions to maintain connectivity were closely coupled with the applications, and were conceived in an ad hoc way. A typical example is that of the exploration of an unknown environment by robots connected jointly in order to map the ground [16] [13] [14]. To keep connectivity in a decentralized manner, Vazquez et al. propose in [16] that the robots analyze the complete topology of the network to recognize the critical links. If such a link exists, the task consists to maintaining it, is priority on the exploration. The proposal of Notarstefano et al. in [17] allows the robots to determine their displacements in a decentralized way, but by maintaining a topology fixed. The exploration algorithm, completely decentralized, proposed by Rooket et al. in [14] ensures the fact that an exploration robot will not lose the connection with the system during its mission. Stump et al. in [15] are arrived to control the displacement of an exploration robots group while keeping connectivity with a stationary robot in a closed environment. The algorithm of exploration of Sheng et al. in [13] is distributed but it needs global information for keeping connectivity; with each planning stage, the robots exchange their auctions on the border to be visited. In [18] Schuresko et al. propose a set of distributed algorithms to propagate the positions of the robots in the network so that each robot calculates its displacement while avoiding that connectivity between them will not be compromised. Couceiro et al. in [20] suggest an algorithm for keeping connectivity within a robots group performing an exploration task via a particle swarm optimization. Zhong et al. in [22] propose a solution based on the neuro-fuzzy in order to maintain connectivity between several mobile robots. The work of Ayad et al. in [23] presents a proactive method for keeping connectivity between a group of mobile robots according to principles of electromagnetic fields and signal strength. All these works gives an outline on the key ideas which were behind the problem resolution of keeping connectivity in the robots networks. Let us note that they have no relationship with the communication protocols used. This represents one of the originalities of work which we present here. Before giving more explanation about the problem addressed here, we have to specify what is meant by "maintaining of connectivity in a robots network". Informally, we wish to propose a strategy that allows the robots to verify, each at its level: 1) If it participates to one or several active communication paths. 2) If this is the case, it must ensure, when moving, that it will not cause any disconnection in these paths. If the disconnection problem can occur, it must plan his displacement in order to avoid it. In this way, the global behavior, resulting from the interactions between the robots of system, adapts to environmental changes due to the mobility so as to keep connectivity in active paths. The main idea on which this work is built is the proposal of a new routing protocol oriented keeping of connectivity, emerging from mobility control. This derives from the AODV protocol, in which a control module of robot movements is added [3][7]. This control is based on that of Boids of Reynolds [2][3][21] which justifies its name of BR-AODV. The rest of this paper is organized as follows: section II, devoted to a state of knowledge, describes the MANETs networks, their routing protocols, swarm intelligence and list the works related to the problem of maintaining of connectivity. Section III, provides the details of treated problem. Section IV explains the proposed solution. Simulation results are presented and discussed in Section V. Finally, the last section done a synthesis of the work accomplished and suggests some perspectives. A. Mobile ad hoc networks II. KNOWLEDGE STATMENT A MANET is a dynamic autonomous system, composed of mobile units, communicating via a radio media without centralized administration. In this kind of topology, all nodes 129 IT4OD 2014

136 cooperate to ensure an appropriate management of the network (such as control and routing) [4][6]. Its dynamic topology plays an important role in the functioning of its communications protocols. That induced a strong impact on its performance and its reliability. This ad hoc technology is deployed on different hardware structures, such as vehicular networks (VANET) and robots networks (RANETs) [24]. They are qualified of autonomous intelligent when they are flexible, self-organize, learn and adapt without external intervention [26]. The management of autonomous intelligent networks is an area of sensitive research. These networks are massively parallel systems; composed of many independent entities performing a predefined task. The overall behavior of the system emerges from the interaction of these independent entities. The programming paradigms are shifting, then, to the development of these entities and the self-organization is the solution to control the whole [26]. 1) Routing in MANETs: The research in the field of routing has accompanied the evolution of networks to adapt the routing methods to different technologies of communication and to needs evolution of users toward Internet connectivity available anywhere, and at any time. These systems are characterized by: - a large heterogeneity in terms of communication technologies (protocols and services), - an important dynamic, due to continuous changes of topology - a variety in traffic patterns, and - a considerable number of users and active services [4]. It is clear that the autonomous and intelligent management, control and adding of new services in these complex and evolving networks, require the definition of new routing protocols [6][19][26]. However, the design of routing protocols for MANETs is more difficult in comparison with conventional systems. Despite many existing proposals in the literature, research on routing in these networks remains valid. The existing routing protocols destined for MANET networks are in charge of finding a way to route data from a source node to a target one, provided that the route exists. If such a route does not exist, these protocols have no influence on the mobility of robots in order to restore it or to create it again [6]. Therefore, we need to fetch inspiration elsewhere, in order to find an alternative. To this effect, nature seems an inexhaustible source. 2) AODV (for Ad hoc On Demand Distance Vector): Is a reactive routing protocol for ad hoc networks. It allows, to mobile nodes, obtaining routes on demand, thereby reducing the size of routing tables at each node as well as traffic control relating thereto and minimizes the number of broadcasted messages. Indeed, as long as the ends of a connection have a valid route which connects them, AODV does not intervene in routing. When a road is required, AODV initiates a discovery process of road for connecting the pair of nodes in question. It is based on the use of two mechanisms [1] [5]: (a) route discovery, and (b) maintaining road. The establishment and maintenance of roads is provided by the exchange of different types of messages: - Route Request - Route Reply - Hello message and - Route Error. Each node maintains a routing table that contains an entry for each destination accessible. An entry in the routing table is created when a node receives a message for an unknown destination. The AODV uses sequence numbers to maintain the consistency of the routing information. Because of the nodes mobility in ad hoc networks, the routes they support may become invalid. The sequence numbers allow the use of the freshest routes [1]. A route unexplored for a while is removed from the routing tables for space reasons. B. Swarm intelligence SI is a new discipline of AI. She tries, using the model of multi-agent systems to design intelligent artificial systems inspired by biological social systems (such as ants, flocks of birds and schools of fish). The members of these societies are unsophisticated. But, despite this, they are able to achieve complex tasks. Coordinated behavior of the swarm emerges from interactions, relatively simple, between individuals. The graphic animation [2], the optimization algorithms [12], the swarm robotics [7][11], routing and load balancing in telecommunication networks [19] are the areas where the SI principles are applied successfully. The resulting systems are characterized in particular by the robustness and flexibility. 1) Swarm robotics: It is a new approach to the coordination of a large number of robots which has emerged as application of SI to multi-robot systems. It is inspired by the observation of social animals. SR focuses on the physical embodiment of individuals, interactions between them and between individuals and the environment [7]. The swarms provide increased capacity to complete a task (high fault tolerance), a low complexity of the units and finally a low manufacturing cost relative to traditional robotic systems. The emergence of synchronized behaviors in these systems is quite impressive for researchers working on multi-robot systems, because they emerge despite on relatively incapable individuals, the lack of centralized coordination and the simplicity of interactions [11]. 2) Boids of Reynolds: The graphic animation was probably the first discipline to focus on the discoveries carried on the decentralized organization of animal societies and SI. Dice 1986 Craig Reynolds, inspired by recent results on the formation of schools fish and flocks of birds achieves a graphical application where agents he calls Boids moves coherently in a virtual environment and model an emergent behavior where each Boid acts autonomously while respecting a number of simple rules [2] [21]: a) Too close to another Boid, it tries to move away. b) Too far from the group, it tries to get closer to its nearest neighbors. c) It continually seeks to adjust its speed to the average speed of its neighbors, and d) Finally, it Avoids obstacles that appear in front of him. 130 IT4OD 2014

137 In short, each Boid seeks to maintain a minimum distance with its neighbors. This depends on its observation and its local vicinity. III. SPECIFICATION OF THE PROBLEM The section II.A allowed us to see that the usual routing protocols of ad hoc systems do not offer solutions to the problem of maintaining connectivity in MANETs. The conventional routing protocols make assumptions that nodes can move without constraints and join or leave the network arbitrarily. This makes them ineffective for maintaining the connectivity required in certain applications. The AODV appreciably minimizes the number of diffusions of messages by creating roads on demand, maintained during its using time. This protocol upgrades the routing tables only when it is necessary and saves the bandwidth. Nevertheless, it appears slower; it updates its table of routing before communicating and the established paths can become, at any moments, failed and invalid. This induces a loss of connectivity in the network. Such a situation is not desirable in some multi-robot applications like a rescue mission [20]. When nodes in a MANET are robots, it is called RANET. Our problem, then, is to study the maintaining of connectivity in such a network. The choice of a RANET is justified by the fact that its nodes can be commanded. In this case, we want ensure the availability of reliable communication channels between all active nodes throughout a given mission. The maintaining of routes between mobile devices of a communication network is a known problem in the MANETs [4][6]. Multiple routing algorithms have been developed for this reason [5][6], but, all, assumes that the displacements of the network nodes are not controllable. This leaves the only alternative of prediction the nodes movements according to existing models, to try to contain this mobility. However, in the case of a robots network, we have advantage to deal autonomous entities that decide to their moves. We focus, then, on this ability to control the mobility for solving the problem of maintaining connectivity in active paths of such networks. In this case, the problem can be reformulated as follows: How to coordinate the robots movements so that the connectivity in the network will not be compromised when transfers are already started (this, while still always arriving to succeed the mission which they are in charge)? IV. PROPOSED SOLUTION According to the problem posed previously, the proposed solution is distributed because the effort in terms of computation and communication is spread over the robots participating for the establishment of a given path. It is generic because it is not related to a specific application context, or to a particular type of robot. The originality of the proposed solution lies in maintaining connectivity in order to increase the lifetime of the links at maximum. To do this we focus on the mobility control that is an integral part of the suggested routing. This solution is concretized through the protocol BR- AODV derived from AODV; it uses its same basic elements (the same exchanged types of packets with same structures, the same structure of routing tables,...) while changing its discovery mechanisms and maintenance of roads. BR-AODV attempts to minimize the number of launches of the discovery process of roads and to support maintaining of active paths as long as they are needed by the emitting sources. To do this, it replaces the maintaining process of roads by a mobility control module, performed by applying the principle of maintaining training in Boids of Reynolds based on the distance control on the movement of robots (see II.B.2). The proposed routing scheme is applied to a network of autonomous mobile robots which the criteria are: The network is asynchronous and of large scale. The multi-hop communication is based on the AODV routing protocol The network is self-directed, self-organized and consisting of N nodes. The number of robots can change dynamically when nodes join or leave network. Each node has a unique identifier ID in the network; it can be a MAC address or an IP address. In our architecture we assume the following: Each robot i has a velocity v i and a position p i. Each robot detects locally one or more neighboring nodes. In order that a node i can define its neighbor list i, it proceeds as follows: In reality, the node i must calculate the distance between its current location and the other nodes of the network, by sending them waves (Hello message). When these waves return toward the node i, it calculates their attenuation rate (signal power), and according to this delivered power, it can define its neighborhood. To simulate this technique we have transformed this problem to a geometric problem, where the node i calculates the distances that separate it from all other nodes in the network by the Pythagoras law: d 2 = (x(i)-x(j)) 2 +(y(i)-y(j)) 2 (1) After the calculation of (1), the node compares the obtained distances with the radius of its scope z. If the distance between i and another node is smaller or equal to the range of i, this node is added to the list of neighbors of i. We have, also, to take into consideration the following assertions: Mobile robots are homogeneous: they are similar in their processing capacity, communication, storage and energy. Each node can move and leave the network arbitrarily. All the network nodes move in an autonomous way, except those which take part in establishment operation of a road or in routing of packets. 131 IT4OD 2014

138 The Nodes whose movement is restricted will be released when a timeout is expired or the routing path where they contribute expired its lifetime (the path has been idle for a while). The principle of Boids of Reynolds is applied to the nodes movements that participate in one or more active paths where each node which contributes to routing data in these paths is handled as a Boid in a swarm where its movement should observe the following three rules: 1) Separation: avoid collisions with the nearest neighbors, this means to keep a minimal distance with them (see (2)), 2) Alignment: adapt its speed to that of its neighbors, and remain in the common direction of displacement (see (3)), 3) Cohesion: stay close to its neighbors while approaching to the swarm center (see (4)). In simulation terms, the problem formalization is made by a group of boids B = {b1, b2,..., bn} which represents all the robots constituting the active paths, where each is placed in a position p i. i is defined as the group of boids in the zone of radius z of boid b i. Each boid moves through the space with a velocity v i. The above rules are translated by the following formulas: v s = (p j p i ) (2) v a = b j i :d(b j,b i )<z 1 v j v i (3) b j i 1 vi c = c i p i where c i = p j (4) i b j i With these three formulas, we can compute the movement speed of each boid at time t +1 as follows: v i 1 v t 1 t w v t w v t w v t 1 (5) s si i i i a ai c ci Where α is the smoothing parameter belonging to [0, 1]. It indicates the influence rate of a robot current perception on his decision to move. w s, w a and w c are weights taken in [0,1], corresponding respectively to the separation, alignment and cohesion. The resulting motion is expressed by the following formula: p i (t+1) = p i (t) + v i (t+1) (6) The displacements of the nodes that do not participate in routing of active paths are not concerned by these rules; they obey rather to the strategy adopted by the task to achieve. i case where the failures are taken into consideration, this phase will appear rarely. In this section we present the simulation results of the AODV protocol and those of the suggested protocol in order to compare them; which will allow us to evaluate our solution. To do so, we are interested to studying the impact of the following factors on the performance of the two protocols: The network connectivity depending on the transmission range of the nodes. The node mobility (speed and direction). The network size (number of node in the network). In our simulations, we are interested to study the impact of the factors described above on the success rate of packet delivery to their final destination which indirectly expresses the ability to maintaining connectivity in the active paths. This rate is measured by the number of packets sent to their final destinations NPTFD on the total number of packets to be transmitted TNPT. We note μ the success rate of packet transmission, and its value is calculated by: μ= NPTFD / TNPT (7) The simulation scenarios were implemented under MATLAB. The various tests were performed using a MANET network of 100 wireless mobile nodes scattered randomly on a plot 100x100 m 2, where each node has a uniform transmission range. Each node represents a robot, nodes are deployed randomly in mission space. The motion of the nodes is performed according to the model of mobility Random Way Point (RWP), mobile nodes are initially randomly distributed around the zone of simulation. A mobile node starts with staying in one place for a certain period of time belonging to [2, 6] Once this period is over, the mobile node chooses a random destination and a speed uniformly distributed in the interval [0 m/s, 50 m/s]. The FIGs presented below show the superiority of the proposed protocol (BR-AODV) on AODV reference protocol via the comparison of the success rate of packet transmission to their destination for each of these two protocols. This by varying the transmission range of each node (see Fig. 1), speed (see Fig. 2) and the size of the network (see Fig. 3). Therefore, we conclude that BR-AODV confirms its performance and its efficiency as well as its adaptation, in terms of needs of connectivity, to networks of autonomous mobile robots compared to AODV protocol. V. SIMULATION RESULTS Despite BR-AODV uses the same basic mechanisms than AODV, nevertheless it differs from the latter in several points. For instance, If we disregard the problem of nodes failure, BR-AODV is totally exempt from the maintenance phase of the roads, it is replaced by the mobility control of nodes, in 132 IT4OD 2014

139 Figure. 1. Impact of the transmission range on the success rate of packets transmission. Figure. 2. Impact of the movement speed on the success rate of packets transmission. Computer systems that have self-* properties and adapt to changes in their environment, without centralized administration, are called "organic computer systems". This can be the case of the network studied here. The resulting behavior of such a network can be classified as "emerging", since it is quite difficult to predict its response. Selforganization of the system can show a negative emerging behavior that is neither wanted nor planned in the design of our protocol. The idea whereas we project to experiment as a complementarity, then, is the control of this undesirable emergence in order to avoid its effects and achieve the desired quality of maintaining connectivity. This, will project us, surely, in the field of organic computing [25]). In addition we project to try to resolve this optimization problem via the "PSO" Meta-heuristic. REFERENCES Figure. 3. Impact of network size on the success rate of packets transmission. VI. CONCLUSION The use of MANET technology as a communication medium in systems of mobile autonomous robots takes a wingspan increasingly wide; a connectivity, as permanent as possible, between robots implied in transmission phases of information is generally desirable. But, it becomes imperative in many real situations where information must be sent at the decision center as soon as it is collected, such as in the missions rescue. In order to solve this problem of cuts roads during the phase of data transmission (caused by the mobility of nodes in a network of autonomous mobile robots), our proposal is based on the original concept of Boids of Reynolds applied to the routing protocol of reference, AODV. The resulting protocol, BR-AODV, is more appropriate, and faces the constraints of a mobile ad hoc network more effectively; it is more flexible, scalable and able to adapt to situations of internal network. Also, it provides better availability of already active routes and much more performance within the RANET network considered in this study. Unfortunately the topological changes due to the dysfunctions of robots and energy depletion remain problematic. This is a part of our future concerns that we plan to treat soon. [1] C.E. Perkins and E.M. Royer, Ad-hoc on-demand distance vector routing. In: Mobile Computing Systems and Applications, Proceedings. WMCSA'99. Second IEEE Workshop, 1999, pp [2] C.W. Reynolds, Flocks,herds and schools: A distributed behavioral model. In: ACM SIGGAPH Computer Graphics, vol. 21, pp , July [3] J. Kennedy and R.C. Eberhart. Swarm intelligence. Morgan Kaufmann, [4] C.E. Perkins, Ad hoc networking. Addison-Wesley Professional, [5] B. Karthikeyan, N. Kanimozhi and S.H. Ganesh, Analysis of Reactive AODV Routing Protocol for MANET. In Computing and Communication Technologies, IEEE, pp [6] M. Abolhasan, T. Wysocki and E. Dutkiewicz, A review of routing protocols for mobile ad hoc networks. Ad hoc networks, vol. 2, pp. 1-22, [7] G. Beni, From swarm intelligence to swarm robotics. In: Swarm Robotics, Springer Berlin Heidelberg, vol. 3342, pp. 1-9, 2005 [8] W. Burgard, M. Moors, C. Stachniss and F. Schneider, Coordinated multi-robot exploration, IEEE Transactions on Robotics, 2005, pp [9] W. Burgard, M. Moors, D. Fox, R. Simmons, and S. Thrun, Collaborative multi-robot exploration. In Robotics and Automation, IEEE International Conference on Vol. 1, pp , [10] M.A.Hsieh, A. Cowley, V. Kumar, and C.J. Taylor, Maintaining network connectivity and performance in robot teams, Journal of Field Robotics, vol. 25, pp , [11] E. Sahin, Swarm robotics: From sources of inspiration to domains of application. In Proceedings of the First International Workshop on Swarm Robotics, Springer, vol. 3342, pp , [12] C. Blum and X. Li, Swarm intelligence in optimization, Springer Berlin Heidelberg, 2008, pp [13] W. Sheng, Q. Yang, J. Tan and N. Xi, Distributed multi-robot coordination in area exploration. Robotics and Autonomous Systems, vol. 54, pp , [14] M.N. Rooker and A. Birk, Multi-robot exploration under the constraints of wireless networking, Control Engineering Practice, vol. 15, pp ,2007. [15] E. Stump, A. Jadbabaie and V. Kumar, Connectivity management in mobile robot teams. In Robotics and Automation, ICRA IEEE International Conference, pp [16] J. Vazquez and C. Malcolm, Distributed multi-robot exploration maintaining a mobile network. In Intelligent Systems, Proceedings nd International IEEE Conference Vol. 3, pp , IT4OD 2014

140 [17] G. Notarstefano, K. Savla, F. Bullo and A. Jadbabaie, limited-range connectivity among second-order agents. In American Control Conference, IEEE, 2006, pp. 6-pp. [18] M. Schuresko and J. Cortés, Distributed motion constraints for algebraic connectivity of robotic networks. Journal of Intelligent and Robotic Systems, vol. 56, pp , [19] A. Giagkos and M. S. Wilson, BeeIP-A Swarm Intelligence Based Routing for Wireless Ad Hoc Networks. Information Sciences, vol. 265, pp , 2014 [20] M.S. Couceiro, R.P. Rocha and N.M. Ferreira, Ensuring ad hoc connectivity in distributed search with robotic darwinian particle swarms. In Safety, Security, and Rescue Robotics (SSRR), IEEE International Symposium, 2011, pp [21] A.Konak, G.E.Buchert, and J.Juro, A flocking-based approach to maintain connectivity in mobile wireless ad hoc networks. Applied Soft Computing, vol. 13, pp , [22] X. Zhong and Y. Zhou, Maintaining wireless communication coverage among multiple mobile robots using fuzzy neural network. In Mechatronics and Embedded Systems and Applications (MESA), 2012 IEEE/ASME International Conference, 2012, pp [23] M. Ayad, J.J. Zhang, R. Voyles, and M.H. Mahoor, Mobile robot connectivity maintenance based on RF mapping. In Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference 2013, pp [24] D.D. Kouvatsos and G.M. Miskeen, Performance related security modelling and evaluation of RANETs. Wireless Personal Communications, vol. 64, pp , [25] C. Mller-Schloer, H. Schmeck and T.Ungerer, Organic Computing-A Paradigm Shift for Complex Systems. Springer-Verlag, [26] M.Ayari, Z.Movahedi, G.Pujolle and F.Kamoun, Adma: Autonomous decentralized management architecture for manets: A simple selfconfiguring case study. In Proceedings of the 2009 international conference on wireless communications and mobile computing: Connecting the world wirelessly, ACM, 2009, pp IT4OD 2014

141 Periodic/Aperiodic Tasks Scheduling Optimization for Real Time Embedded Systems with hard/soft constraints Fateh Boutekkouk Dept. of Mathematics and Computer Science University of Oum El Bouaghi Oum El Bouaghi, Algeria Soumia Oubadi Dept. of Mathematics and Computer Science University of Oum El Bouaghi Oum El Bouaghi, Algeria Abstract This paper deals with Real Time embedded systems scheduling optimization using genetic algorithms. The application is modeled as a set of periodic/aperiodic tasks graphs with precedence and hard/soft constraints. Architecture in turn is modeled as a graph of embedded processors which are connected by a shared bus or buses hierarchy with fast links. Our proposed approach takes advantage of both static and dynamic preemptive scheduling. The developed genetic algorithm uses a binary coding and tries to minimize the response time and tasks missing their deadlines. Keywords Real Time Embedded Systems, Scheduling, Genetic Algorithms. I. INTRODUCTION In the last years, Real Time Embedded Systems (RTES) are becoming omnipresent. This new technology is very attached to our daily life and can be nearly found in all domains such as transport, house electro manager, consumable electronics, games, telecommunication, multimedia, aerospace, military, etc. RTES are reactive systems with sensors/actuators tailored to execute specific tasks that can be periodic, aperiodic or sporadic. These tasks can be subjected to a variety of temporal constraints. The most important one is the deadline. In fact, RTES are classed to hard, soft or firm systems. This taxonomy depends on the deadline respect or not and its impact on the environment [4]. RTES design faces a big challenge and must minimize the overall cost and the power consumption. In this context, we are interested in high level performance estimation and optimization of RTES with periodic/aperiodic tasks and hard/soft constraints targeting multiprocessors architecture. Here, we must note some differences between traditional multiprocessor architecture and RTES multiprocessor architecture. Firstly, RTES architecture is based on distributed memory and not shared memory. Secondly, RTES architecture includes generally an RTOS (Real Time Operating System) implemented as a k ernel without conventional files, complex I/O or memory management mechanisms (like pagination). RTOS is focused on Real Time Scheduling. An important difference is at the microprocessor level. Embedded processor is characterized by low clock frequency, low memory capacity and low power consumption. An embedded processor may change its frequency dynamically. According to the literature, we can state that most existing work target RTES with periodic or aperiodic, hard or soft constraints. Our effective contribution resides in the proposition of a mix RTES model that integrates periodic and aperiodic tasks with hard and soft constraints. As it has been known, the scheduling problem is a difficult problem. This latter can be resolved using meta-heuristics. A meta-heuristic is applied on a vast class of problems with imprecise or incomplete data. Genetic Algorithms (GAs) appear a good choice for solving complex, non linear, multi-objective and multi-modal problems that is the case in RTES. We can for instance optimize many objectives like response time, power consumption, the processor usage ratio, etc. In this work, however, we employ GAs to optimize (minimize) the mean response time of tasks and the number of tasks missing their deadlines under the idea of balancing between processors usages ratios. The rest of paper is organized as follows: section two is devoted to related work. Our proposed RTES model is developed in section three. Section four presents a GA for RTES performance optimization. The experimentation with some results is discussed in section five before the conclusion. II. RELATED WORK The application of GAs to resolve the scheduling problem is not new. For traditional multiprocessor systems (not Real Time), the objective was primary to minimize the makespan which is the time consumed between the first and the last task in the system. Several GAs were developed [1, 2, 3, 7]. In the real time context, tasks acts as infinite loops, the most important parameter is the response time that is the interval between the task arrival time and its end time. Several works tried to apply GA or multi-objective GA to optimize Real time multiprocessor systems performance [5, 6, 8, 9]. According to the literature, we can state that most works make strict hypothesis to simplify performance analysis. For instance they 135 IT4OD 2014

142 target only one class of Real time systems (i.e. periodic tasks) with one type of constraints (i.e. soft). We note that in most cases, it is difficult to compare between the performances of the developed GAs because, the hypotheses are different. Contrary to these works, our objective is to develop an RTES model to represent both periodic and aperiodic tasks with hard and soft constraints. Such a model must be realistic but at the same time it has to simplify performance analysis. III. RTES MODELING RTES modeling is the first step in performance optimization. Several models of computation have been proposed in the literature, however in this work, we are interested in tasks graphs with precedence relations. A. Application modeling RTES logical part or application is modeled as a set of tasks graphs (TG). Here, we distinguish between two classes of TG: periodic TG (PTG) and aperiodic TG (ATG) Tasks scheduling uses two policies: static scheduling where tasks priorities are pre-calculated and then fixed for all periods or pseudo-periods according to their relative deadlines (DM: Deadline Monotonic) or periods (RM: Rate Monotonic) and dynamic scheduling where priorities are randomly assigned to tasks for each period or pseudo-period. Messages between tasks are also modeled in both PTG and ATG. A message is activated only when the task producing this message completes its execution. Each message has a size (in KBytes). A message m 0 has a priority higher than a message m 1 if the destination task of m 0 has a relative deadline (or a period) less than the destination task of m 1. We assume that tasks have no period, but when the period or the pseudo-period of a TG is reached, all transmissions are stopped and TG messages parameters are initialized. B. Architecture modeling RTES physical or hardware part is modeled by a graph where nodes are processors and arcs are buses. Our hardware architecture represents a m ultiprocessor architecture with shared bus, buses hierarchy with bridges and fast links (bipoints connections). Each bus, bridge or fast links has a debit (speed) (Kbytes/s). Each embedded processor is characterized by a computing capacity, a local memory and an RTOS (Real Time Operating Scheduler) to execute one or more tasks. Figure 2 shows an example of hardware architecture with eight processors, three buses, three bridges and two fast links Lin Soft and periodic tasks Hard and periodic tasks Soft and Aperiodic tasks bridg Fig. 1. Example of PTG and ATG 1. PTG It is composed of periodic tasks with messages. Each TG has a period P, so all tasks belonging to the same TG has the same period P. We assume that all periodic tasks are synchronous (have the same arrival time). Each task has a relative deadline D. Here, we can distinguish between PTG with soft tasks (in green) and PTG with hard tasks (in red). Each Soft task is characterized by an execution time (when it is allocated to a processor) expressed as ACET (Average Case Execution Time) and each hard task as WCET (Worst Case Execution Time). 2. ATG It is composed of aperiodic tasks with messages. We assume that all aperiodic tasks are soft. The arrival dates of aperiodic tasks are generated following the Poisson Law with parameter λ. Each aperiodic task has a relative deadline and an ACET. In order to simplify performance analysis, we assign to each aperiodic task a pseudo-period. The length of this pseudoperiod equals to the mean of inter-arrival times that are generated by the Poisson Law for each aperiodic task. bridg Lin Fig. 2. Example of hardware architecture with eight processors, three buses, three bridges and two fast links We assume that all embedded processors use the same scheduling policy and have limited size queues to stock ready tasks. C. Tasks and messages Allocation Tasks and messages allocation consist in assigning tasks to processors and messages to buses. Messages allocation depends on tasks allocation. The tasks allocation precedes the tasks scheduling and can be made randomly or according to some greedy algorithms. IV. OUR GENETIC ALGORITHM GA steps are presented in figure 3. bridge 136 IT4OD 2014

143 Generation of initial population (allocate tasks to processors randomly) Chromosome reparation Number of tasks Number of processors T1 T2 T3 T4 T5 P P P P P gene Tasks and messages scheduling Stop criterion Evaluation No Yes Best chromosomes selection Optimal Solution Fig. 4. Chromosome coding 3. Tasks and messages scheduling Two scheduling algorithms are used in our approach: RM and DM. For aperiodic tasks, we apply aperiodic tasks server algorithm, so we create a new periodic task or server (serving aperiodic tasks) with lower priority whose period is equal to the computed pseudo-period. The analysis time is equal to the least common multiplier of tasks periods. Messages allocation and scheduling is done in a similar fashion to tasks allocation and scheduling. Note that if two dependent tasks are allocated to the same processor, the message transfer time between the two tasks is considered null; otherwise, the time of message transfer depends on the way the processors are connected. Crossover Mutation The message M1 will be transferred to Bus 1 M1 M2 M3 M4 Bus Bus Bridge Link Insertion and replacement Link Fig. 5. Messages allocation Fig. 3. GA steps A. GA with static priorities 1. Chromosome coding A chromosome is coded as a binary matrix Mat. Columns correspond to tasks and lines correspond to processors. Mat(i,j) = 1 if task j is allocated to processor i 0 otherwise. 2. Chromosome reparation Used to repair chromosomes in cases where a processor has zero tasks, a processor with a number of tasks surpassing its queue size, a task with zero or more than one processor. 4. Evaluation Our objective is to minimize tasks response times and the number of tasks missing their deadlines but at the same time balancing between processors usages ratios. The response time of a t ask is the elapsed time between the task activation (arrival) and the task end time. We add to this time, the overhead due to message transfer over buses. The message transfer time is calculated on the basis of buses speed. In order to evaluate chromosomes fitness, we have to define two functions named TRMS, TUM : TRMS the mean response time of system tasks (1) TRM i is the mean response time of the task i. 137 IT4OD 2014

144 of two matrixes: one for tasks allocation and the other for tasks priorities assignment. (2) Where TRi is the response time of the task in activation i; nb_activ is the number of activations. TUM is the mean usage ratio of system processors Number of tasks Number of processors T1 T2 T3 T4 T5 P P P P gene (3) TU j is the usage ratio of a processor j Prioritie T1 T2 T3 T4 T5 Prior Prior Prior Prior Prior Fig. 6. Chromosome coding with dynamic priorities (4) Uoccup is the occupation time of a processor In order to computer the number of tasks missing their deadlines, we define a counter which is incremented whenever a task misses its deadline. 5. Stop criterion As a stop criterion, we decide to choose a fixed number of generations. After this number, the GA is stopped. 6. Selection Selection is based on fitness value. In our case, the user can select the response time or the number of tasks missing their deadlines as fitness function. Two selection techniques are used: elitism and tournament. 7. Crossover Two crossover techniques are used: in one point and in two points. Our crossover is done between the best selected chromosome and the worst one of the population. In the case of one point crossover, the cutting point is computed on the basis of the fitness values ratios of the two parents. In the case of two points crossover, the cutting points are selected randomly. 8. Mutation This operator is used with a certain probability P mut. It consists in changing the value of a certain gene in the chromosome (changing the processor of a task). In our GA, we generate a random number between 0 and 1, if it is less than P mut, the mutation is done otherwise the value of the gene remains unchanged. 9. Insertion and replacement In order to enhance the quality of chromosomes, we keep the half best chromosomes (parents) of the population for crossover then we insert the parents to build a new population. Since each crossover produces two new chromosomes, the population size remains constant. B. GA with dynamic priorities With this technique, we assign to each task a random priority during each generation. The chromosome is composed The role of chromosome reparation in this technique is to preserve the tasks priorities in the precedence relations in the graph tasks (the predecessor task has always a priority higher than the successor task). Independent tasks that are allocated to the same processor can not have the same priority. All other genetic operators are similar to the first technique. The main distinction is the chromosome coding (all genetic operators use two matrixes instead of one). V. EXPERIMENTATION We have tested our GA on a typical example including 20 tasks distributed on 4 T G and 3 different architectures (see figures 7, 8, 9, and 10). Task T2 is a server task. For the sake of space, we do not show tasks, processors and buses parameters. Figures 11 and 12 show respectively the mean response time and the number of system tasks beyond their deadlines progression over the GA iterations for the three architectures. Table 1 gi ves the best solutions found for the three examples after a set of experiments. Our tests are done on the basis of the following parameters: Tasks_number = 20, Sched_policy =DM, Crossover = one point, P mut = 0.1, Tps_Sim = 60, pop_size = 100), Soft PTG Soft ATG Soft ATG Fig. 7. A typical example Hard PTG 138 IT4OD 2014

145 the three architectures have the same value because our GA tries to balance between all processors charges. 16 First example Second example third example Fig. 8. Shared bus Shared bus 14 Average response time Bus1 0 Bus2 Bridge1 a number of generations Fig. 11. Mean response time progression over generations 14 First example Second example third example Bus1 Bus2 Fig. 9. Two buses with a bridge Link1 Link2 Bridge1 Tasks beyond their deadlines b number of generation Fig. 12. Tasks beyond their deadlines number progression over generations TABLE I. BEST RESULTS Fig. 10. Two buses with a bridge and two fast links Discussion According to the results, we can remark that the mean response time and the number of tasks beyond their deadlines improve linearly over generations for the three examples. Note that the mean response time in the case of buses hierarchy with fast links is a bit little than the first and second architecture. The same thing for the number of tasks beyond their deadlines. The shared bus architecture performance is a bit better than buses hierarchy may be this is due to the bridge overhead and even allocation that can have a big impact on scheduling (i.e. if two dependent tasks are allocated to the same processor, the transfer message time is null). The processors usage ratios for Example Mean response time (3 Cpus) Usage ratios of Cpus Number of tasks beyond their deadlines % % % 0 Figures 13 and 14 show respectively the mean response time and the number of system tasks beyond their deadlines progression over the GA iterations for the three architectures in the case of dynamic priorities. Table 2 gives the best solutions found for the three examples after a set of experiments. 139 IT4OD 2014

146 Tasks beyond their deadlines Average response time a b First example Second example third example number of generations Fig. 13. Mean response time progression over generations (dynamic priorities) First example Second example third example number of generation Fig. 14. Tasks beyond their deadlines number progression over generations (dynamic priorities) TABLE II. BEST RESULTS FOR DYNAMIC PRIORITIES Example Mean Response Time (3 Cpus) Usage Ratios of Cpus Number of tasks beyond their deadlines Discussion According to the results, we can observe that the mean response time and the number of tasks beyond their deadlines improve non-linearly over generations for the three examples. On the other hand, the shared bus architecture performance is better than the second and the third architecture. However, the third architecture shows a lower number of tasks beyond their deadlines. Table 2 shows well that performance in the case of dynamic priorities is lower than static priorities. This is due to the fact that dynamic priorities are generated randomly. VI. CONCLUSION In this paper, we proposed a new RTES model for both periodic and aperiodic tasks with hard and soft constraints. In order to optimize RTES performance, we developed a GA implementing static and dynamic preemptive scheduling. Static priorities are assigned following RM and DM algorithms. Dynamic priorities are assigned randomly but a r eparation mechanism is developed to preserve the precedence constraints. We have tested our GA on a typical example with three different architectures. According to the obtained results, we can conclude that performance in the case of static priorities is higher than the one in dynamic priorities. Allocation may have a big impact on scheduling. As perspectives, we plan to make more tests on our GA to understand the impact of GA parameters on the quality of solutions and to integrate the power consumption in the fitness function. REFERENCES [1] R. C. Corrêa, A. Ferreira, and P.Rebreyend, Scheduling Multiprocessor Tasks with Genetic Algorithms, IEEE transactions on parallel and distributed systems, Vol. 10, No. 8, August [2] M.K Dhodhi, I. AhmadetIshfaq Ahmed, A multiprocessor scheduling scheme using problem-space genetic algorithms, Evolutionary Computation, IEEE International Conference, Vol.1, Perth, WA, Australia, Dec [3] M. S. Jelodar, S. N. Fakhraie, F. Montazeri, S. M. Fakhraie, M. NiliAhmadabadi, A Representation for Genetic-Algorithm- Based Multiprocessor Task Scheduling, 2006 IEEE Congress on Evolutionary Computation Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-21, [4] M. Kasim Al-Aubidy, Real-time s ystems, C lassification of Real-Time Systems, Computer Engineering, Department Philadelphia University, Summer Semester, [5] Y. Li, Y. Yang, M. Ma, R. Zhu, A Problem-Specific Genetic Algorithm for Multiprocessor Real-time Task Scheduling, The 3rd Intetnational Conference on I nnovative Computing Information and Control IEEE [6] M. R. Miryani, M.Naghibzadeh, Hard Real-Time Multiobjective Scheduling in Heterogeneous Systems Using Genetic Algorithms, 2009 IEEE Proceedings of the 14th International CSI Computer Conference (CSICC'09), [7] M. Rinehart, V. Kianzad, and S. S. Bhattacharyya, A Modular Genetic Algorithm for Scheduling Task Graphs, Technical Report UMIACS-TR , Institute for Advanced Computer Studies, University of Maryland at College Park, June [8] N. Sedaghat, H. T. Yazdi, and M. Akbarzadeh-T3, Pareto Front Based Realistic Soft Real-Time Task Scheduling with Multiobjective Genetic Algorithm in Unstructured Heterogeneous Distributed System, 5th International Conference, GPC 2010, Hualien, Taiwan, May 10-13, [9] I. Stierand, P.Reinkemeier, T. Gezgin, P.Bhaduri, Real-time scheduling interfaces and contracts for the design of distributed embedded systems, 8th IEEE International Symposium on Industrial Embedded Systems (SIES), Porto, IT4OD 2014

147 Topic 5: WEB Technology and Knowledge 141 IT4OD 2014

148 Developing a Knowledge Distributed Architecture to Discover and Compose Semantic Web Services Mohamed Gharzouli Department IFA, Faculty of New Technologies of Information and Communication (NTIC), University Constantine 2 Constantine, Algeria Djamel Benmerzoug Department TLSI, Faculty of New Technologies of Information and Communication (NTIC), University Constantine 2 Constantine, Algeria Abstract The first generation of works done on Web services architectures propose different solutions based on centralized discovery methods (such as UDDI), where Web services are described by service interface functions and they publish their capabilities and functionalities with a registry. However, these methods are not adapted to the dynamic interactions. For this reason, recently, many solutions are proposed to proceed to the distributed discovery of Web services. The majority of research works illustrate P2P-based discovery methods. In this paper, we present a P2P-based architecture for semantic Web Services discovery and composition. Through, Instead of using a centralized repository, in this solution we use Knowledge-Centered point that saves a main ontology used by the different peers to ensure the homogeneity. In addition, the main objective of this work is to propose a solution that permits to exploit the experiences already realized in the network. Keywords Semantic Web services, P2P computing, Web services discovery, Ontology, Distributed applications I. INTRODUCTION Peer-to-Peer (P2P) networks have recently received great attention due to their inherent scalability and flexibility. P2P systems are decentralized, self-organizing distributed systems that cooperate to exchange data. The increasing popularity of P2P systems for file sharing indicates general interest in resource sharing [7]. In the last few years, research on P2P systems has been quite intensive, and has produced significant results in scalability, robustness, location, distributed storage, and system measurements. P2P systems are being increasingly used in many different application domains. Among these emergent research fields, Web services based on P2P computing require special attention from collaboration and interoperability in a distributed computing environment. Web Services technology is considered as a revolution for the web in which a network of heterogeneous software components interoperate and exchange dynamic information [4], [5], [6]. Recently, P2P technologies have been used to improve the automatic discovery of Web services. P2P systems provide a scalable alternative to centralized systems by distributing the Web services among all peers. The P2P based approaches offer a decentralized and self-organizing context, where Web services interact with each other dynamically. On the other hand, other technologies are also used to facilitate the dynamic discovery of Web services. An important improvement has been made for Web semantic technologies. In this context, the Web semantic can have intervened to resolve the problem of technical based description of the Web services discovery. A semantic description of Web services (for example, by using OWL-S description) is more understandable by the machine, which facilitates the dynamic discovery of the Web services. Therefore, in order to exploit the advantages of P2P networks and the Web semantic, we combine these technologies to propose a strategy to manage, discover and compose Web services. The main objective of this work is to ameliorate the already proposed architecture [1] through a development process that tries to reuse the already compositions realized in the network. This task permits to decrease the time of research and discovery of Web services in a P2P network. The rest of this paper is organized as follow: in section 2, we discuss about the different methods of web services discovery. Section 3 presents a P2P framework for semantic Web services discovery and composition. In section 4, we present a development process to ameliorate this framework. Section 5 discusses some related works and section 6 concludes the paper. II. CENTRALIZED METHODS FOR WEB SERVICES DISCOVERY Centralized discovery methods (such as UDDI) present the first generation of works done on Web services architectures, where Web services are described by service interface functions and they publish their capabilities and functionalities with a registry (Figure 1) [24]. In this architecture, the discovery of Web services remains two main hard problems. The first one is the technical description realized generally by WSDL. This last provides only a technique description. However, the automatic Web services discovery requires a more intelligible description that more understandable by the machine. The second problem of Web services discovery is the centralized point of publication and discovery which represent a repository like UDDI or a research engine like service finder [25]. 142 IT4OD 2014

149 Fig. 1. Web services Architecture To solve these problems, we discuss in the following paragraphs the use of the Web semantic and the P2P technologies in the context of Web services discovery. A. Web services and Semantic Web services Web Services presents a network of heterogeneous software components interoperates and exchanges information with a high degree of the interoperability. In this context, the first specifications proposed in the web service architecture offer a good level of the technique interoperability, especially, SOAP (Simple Object Access Protocol [26]) and WSDL (Web Services Description Language [27]). Even this last represents one of the most important factors of the success of Web services technology; it provides only a technique description. However, the automatic Web services discovery requires a more intelligible description. Thus, Web services should act autonomously with as minimal human intervention as possible [6]; they should be able to discover other services which have particular capabilities and realize precise tasks. In this context, the Web semantic technologies can have intervened to resolve the problem of technical based description of Web services. A semantic description is more understandable by the machine, which facilitates the dynamic discovery of the Web services. In this field, many of semantic languages are proposed: like OWL-S [28], WSMO [29], METEOR-S [30] and other specifications. Among these languages, OWL-S is the most powerful and the most used by the web semantic communities. For this reason, in our work we used this language and we operated some related tools of this language like wsdl2owls [31] and OWL-S MX Matchmaker [32]. B. P2P based methods for Web services discovery The centralized discovery methods are not adapted to the dynamic interactions, they restrict the scalability of flexible and dynamic environment [2] [8], induce performance bottleneck and may result in single points of failure [10]. Moreover, the centralized control of published services suffers from many problems such as high operational and maintenance cost. Furthermore, the universal UDDI repository suffers from the miss of moderation: many of published Web services are not available or they are not implemented by the providers. In addition, even if the web services are described semantically, one of the major problems with existing structure is that UDDI does not capture the relationships between entities in its directory and therefore is not capable of making use of the semantic information to infer relationships during search [11]. Secondly, UDDI supports search based on the high-level information specified about businesses and services only. It does not get to the specifics of the capabilities of services during matching [12]. To solve these problems, other architectures are proposed to facilitate the discovery and composition of Web services. Recently, many solutions are proposed to proceed to the distributed discovery of Web services. The majority of research works illustrate P2P-based discovery methods. Many of them are related to automated discovery and composition of semantic Web services in the P2P networks. These research works are categorized according to the different types of P2P networks: unstructured (Gnutella V0.4 [33]), hybrid (Gnutella 0.6 [34]) or structured (like Chord [13]). Among these types, in our work presented in [3] we employed the unstructured P2P networks for semantic Web services discovery and composition in that we proposed a distributed strategy based on epidemic algorithms. After, in [1], we proposed a distributed architecture to implement the proposed strategy. This architecture is well presented in the following section. III. A P2P FRAMEWORK FOR SEMANTIC WEB SERVICES DISCOVERY AND COMPOSITION In our Work presented in [1], we described a distributed framework, which implements a discovery strategy already presented in [3]. In an unstructured P2P network, each peer plays the role of a costumer and a provider of Web services. This framework is installed on the various peers of the network, whereas a central base of OWL ontologies (and OWL-S descriptions) is used as a reference to develop the various local ontologies and semantic descriptions of the different provided Web services of the network (The global architecture is presented in figure 2). Fig. 2. Global Architecture In this architecture, each peer in the network implements a number of Web services described semantically with OWL-S by using a local ontology. The Knowledge-centred base is consulted by the different peers to develop or enrich local ontologies of every peer. Using the same language of semantic 143 IT4OD 2014

150 descriptions (OWL-S) and a central base of ontologies is an important point to ensure the homogeneity. The central OWL ontologies base contains the different concepts used in diverse fields of proposed Web services. So, each peer wants to participate in P2P Web services composition can generates OWL-S descriptions of their Web services. Also, the central base offers some OWL-S descriptions for various Web services. If a peer has a similar web service, it can reuse these OWL-S descriptions. In this context, an example of a universal OWL ontology and a collection of OWL-S descriptions for a variety of Web services is generated manually by Ganjisaffar and Saboohi which contains more than 240 semantic services descriptions and a main ontology called concepts.owl [36] (in the following sections, we will present some examples of this ontology). In the following figure, we describe the distributed framework installed on the various peers of the network. B. The Local Composition Module This module has as an objective to discover a local Web service to answer the external requests for the other peers. To achieve this goal, we propose two components: the local search engine and the local composition engine. The Local search engine has two possible tasks: searching a basic Web service or an eventual local composite Web service. When the peer receives a request from the network (figure 3 - arrow3-), the P2P composition engine passes the request to the local search engine (figure 3 -arrow4-). This last searches a basic Web service to answer this request. In the same time, the local search engine uses an OWL-S matchmaker to discover a possible composition from a whole of local Web services which responds to the request (figure 3 - arrows5-). As a result, we have three possible scenarios: Fig. 3. A Distributed Framework to discover and compose Semantic Web Services. The main idea of this framework is to make a distributed discovery of Web services. This distributed framework contains three layers: the semantic manager, the local composition module and the P2P composition module (figure 3). A. The Semantic Manager module It manages the semantic descriptions of Web services. The user uses an OWL-S generator and a base of local OWL ontologies (figure 3 -arrows 2- ). These last must be developed according to the central ontologies (figure 3 -arrow1-). The OWL-S generator uses the WSDL descriptions and the ontologies of the fields to generate the OWL-S descriptions of Web services. In our case we used to wsdl2owl-s generator [31] as a core to implement this component. If there is any basic web service or a probable composition, the local search engine returns a negative response to the P2P composition engine. If there is a basic Web service or an eventual composition. In the invocation step (in the composition), the local search engine generates a BPEL file and sends it to the local composition engine (figure 3 -arrow6-), which uses the service invoker to invokes the basic Web services or a whole of Web services according the process defined in the BPEL file. If there is a probable semi-composition formed from a single Web service or a whole of Web services, the local search engine generates a request and proposes it to the P2P composition engine. This last can send this request to other peers in the network that can continue the discovery operation. 144 IT4OD 2014

151 C. The P2P Composition Module This module presents the interface between each peer and the P2P network. It s the main component used to realize a collaborative composition among a whole of participant peers. This layer contains three components: the composition table (already presented in [3], a filter and the P2P composition engine. The filter is a program used directly by the user to clean the composition table from the inactive composition (figure 3 - arrow7-). This operation is very important in a dynamic environment like the P2P networks where many peers join and quit the network frequently. For this reason, there are many P2P compositions that become inaccessible when one or many participant peers are absent. The filter makes statistics about the composition table entrees to detect the compositions that became inactive since a long time (figure 3 -arrow8-). These statistics offer a clear vision to the user about the composition states in the table. The P2P composition engine contains a request/response component to receive or to send requests (figure 3 -arrow9-). This component is used in many possible scenarios that are a relationship with the scenarios already explaining the local composition layer. Furthermore, the user can use the P2P composition engine to start a search operation in the network. When it receives a request from another peer of the network (using the P2P platform) (figure 3 -arrow3-), the P2P composition engine sends the request to local composition engine. This last can return three possible responses: A positive response: in this case the P2P composition engine sends the response to the request peer. A negative response: the P2P composition engine evaluates a new TTL and transfers the request to other peers (the direct neighbors in the network). A semi-composition proposition: the P2P composition engine searches before in the composition table from a composition that responds to the main request or can continue the composition with the request of the semicomposition proposition (figure 3 -arrow10-). If there is a composition in the table, the P2P composition engine sends the request to the initiator peer of this composition. Else the P2P composition engine evaluates the TTL and continues the discovery using the request proposed by the semi-composition of the local composition engine. In the end, if the discovery has been finished successfully, the initiator peer generates the BPEL file to launch the P2P composition. In this case, each participant used the P2P composition and the local composition engine to receive, execute and send the response (figure 3 arrow 11- ). The composition table: this table presents the experience of each peer in the network. Every participant peer creates this table to store all compositions in which it has already participated to realize them. It is necessary to give the various definitions of the different attributes of the composition table. These different concepts are well presented in [3]: Initiator Peer: the peer that begins the composition. compoid: the composition identifier (each composition is defined in the network by its initiator peer and its identifier). Init-input: initial input of the composite Web service. Init-output: initial output of the composite Web service. Goal: the goal of the composite Web service Executed services: Web services executed locally (in a peer) to compose the achieved Web service. Reserve services: local web services which can be executed in the same composition. Precedent peers: peer, which executes the precedent Web services for the composition. Next peers: peer, which execute the next Web services for the composition. Reserve peers: peers, which can replace the next peer. State of the composite Web services: the composite Web service is active, if all the participants peers are joining the network. D. Discussion about the proposed architecture As a conclusion, we can deduce that the main problem of this architecture is important discovery time because of the semantic matching (OWL-S Matchmaking) realized in the runtime. For this reason, we present in the following section a development process, which can ameliorate this solution. In addition, the use of the composition table is a good solution for reusing the experiences of each peer. However, this solution suffers from the problem of the data coherency of this table (we don t talk about this point in this paper). Also, using only a repository of local experiences in each peer doesn t give a high degree of reutilization. For these reasons, we propose to create a centered base of experiences to increase the probably of discovering a composite Web service in the network and that becomes to decrease the search time. IV. A DEVELOPMENT PROCESS TO AMELIORATE THE PROPOSED ARCHITECTURE In this section we describe the followed steps to discover and compose semantic Web services by using the framework presented in section 3. We devised this process to three main phases: in passive time, runtime and the reuse of the experiences. A. In the Passive time The following steps are realized in the passive time to decrease the search and the discovery time as possible, especially, the semantic matching, which takes a lot of time. These steps are realized by each provider of web services (a peer) that want to collaborate with the other peers of the network to compose new complex web services in the network. Development of local ontologies: Using the main central ontology base, each peer develops local ontology according to its needs. The developer uses only the concepts that corresponding to the domains of the provided Web services. It s important to mention that the developer can ignore this step. It s only useful if the provider wants to add some new private concepts to its local ontology compared to the main ontology. In other cases, the developer can reuse directly the 145 IT4OD 2014

152 main ontology to generate the OWL-S descriptions of the local Web services. The developer can download the main ontology and modifies it locally. The use of the main ontology to generate an OWL-S description is explained in the following step. Generate the OWL-S descriptions: In the first case, each peer can reuse one or more several OWL-S descriptions that provided by the central Knowledge base. For example, [36] provides about 240 OWL-S descriptions of a large variety of web services. In this case, if the provider want to supply a Web service that corresponding to one of the provided OWL-S description, he can reuse directly this description. In the other possibility, by using the WSDL description and the ontology of the domain of a Web service, the developer can generate the semantic description of this web service. In our work we propose to use the open source device wsdl2owls convertor [31], which generates automatically an OWL-S description from the WSDL description. The following example (figure 4) shows the operating of this program. composite Web services. If a peer tries to compose a local Web service in the runtime (When it receives a request), it can increase the search time in the network, especially, if we know that this operation can finish with failure (this point is more explain in the section runtime Discovery Step- ). For these reasons, it s better to search about the eventual composition in the passive time. In order to realize this goal, we propose to use an OWL-S Matchmaker that uses the OWL- S descriptions of a whole of web services to verify if these Web services can be composed semantically or note. In our work, we used the OWLS-MX 2.0, which is a semantic Matchmaker for Web services. It has as an input a whole of OWL-S descriptions of Web services then it tries to matchmaking [32]. Adding services one per one with owl-s and WSDL descriptions Operations and Web services Inputs/Outputs Fig. 5. OWL-S MX Matchmaker Fig. 4. The wsdl2owls convertor The figure 4 shows the generation of an OWL-S description of the service ReserveVol, which implements five operations: count, reservervol, create, find and findrange. Search an eventual local composition: Each peer in the network implements a whole of basic Web services that can response to the received requests. However, in many cases one Web service can t response directly to the request. Thought, a composition of a whole of basic Web services (belong to the same peer) can give the adequate response. For this reason, each peer must search in the passive time the eventual possible B. In the Runtime This phase contains two main steps: the discovery of Web services in a P2P network and the invocation of the Web services to compose a new one. Discovery of Web services in a P2P network: When it receives a request about a Web service, each peer of the network tries initially to find locally a basic (or a composite) Web service that response to this request. Else, the peer search in its local experiences repository about a composition already realized to reuse it. Else, the peer tries to transfers the request to other peers of the network. There are a variety of algorithms previously proposed to discover and compose new Web services in P2P networks. Among these algorithms, in our work presented in [3], we proposed a whole of epidemic algorithms to discover web services in an unstructured P2P network. In addition to the input/output matching, this algorithm uses the concept of goal 146 IT4OD 2014

153 defined in [2]. The main epidemic algorithm is defined as follow [3]: Step 1: the peer receives a request from another peer that searches about a Web service with the arguments: Init-Input (initial input), Init-Output (initial output) and goal. Step 2: the peer searches firstly a basic Web service, which response to the request. Step 3: if there is a local basic Web service, the peer sends the result to the request peer. Step 4: else, the peer tries to compose a new Web service locally. For this task, the peer uses the search engine. It s important to mention that in this architecture we proposed to realize this task in the runtime. In the ameliorated architecture, the peer exploits the result of the step search an eventual composition defined in the passive time. Step 5: if the step 4 gives result, the peer sends it. Step 6: else, the peer searches in its composition table (local experiences) about a previous composition, which responses to the request. Step 7: else, the peer continues the discovery operation. Before that, the peer can progress the operation by replacing the arguments of the research. Composition step (using a BPEL engine): If the discovery operation has been finished with success, the initiator peer launches the composition by generating a BPEL file starting from the returns results (the sequence of peers that implements the discovered Web services). After that, the initiator peer uses the BPEL engine to invoke the Web services, according to the generated BPEL file. Figure 6 shows an example of a composition that contains four Web services implemented per three peers. The composite web service is constituted from two Web services, the other are considered like a reserve services in the discovery operation. Fig. 6. Example of a P2P composition The objective of this composite service is to give the price of a book in Algerian Dinar. The inputs, Outputs and goals of the discovered Web services are explained in the following table. TABLE I. INPUTS, OUTPUTS AND GOALS OF THE DISCOVERED WEB SERVICES. Web service Input Output Goal WS1 (reserve) Book info (title, Price Gives the author ) price in EURO WS2 (executed) Book info (title, Price Gives the author ) price in US Dollar WS3 (reserve) Money in Dollar Money in Convertor Algerian Dinar USD-AD WS4 (executed) Money in Dollar Money in Algerian Dinar Composite WS Book info (title, Price author ) Convertor USD-AD Gives the price in Algerian Dinar In the composition step, the initiator peer generates the BPEL file to compose the new Web service. The reserve services will be invoked if the primarily (executed) Web services don t give a result. The following listing presented the generated BPEL file that uses four Web services (WS1= FindPriceLocalPL, WS2= FindBookPriceServicer, WS3=ReserveConvertor and WS4= OtherPeerConvertorPL) <?xml version="1.0" encoding="utf-8"?> <process name="findpriceprocess". <import namespace="http://services/" location="../findbookpriceservicerser vice/ FindBookPriceServicer.wsdl" importtype=" wsdl/"/> <partnerlinks> <partnerlink name="findpricereservepl".. partnerlinktype="tns:findbookpriceser vicerlinktype" partnerrole="findbookpriceservicerrol e"/> <partnerlink name="findpricelocalpl" partnerlinktype="tns:findpricelocalse rvicelinktype" partnerrole="findpricelocalservicerol e"/> <partnerlink name="otherpeerconvertorpl" partnerlinktype="tns:convertorservice LinkType" partnerrole="convertorservicerole"/> <partnerlink name="reserveconvertor".. <partnerlink name="partnerlink1" partnerlinktype="tns:findpriceindesir 147 IT4OD 2014

154 edcurrency" myrole="findpriceindesiredcurrencypor ttyperole"/> </partnerlinks> <variables> </variables> <sequence> </sequence> V. RELATED WORKS In general, the research works suggested in the domain of the semantic Web services composition are classified in following principal categories: Importation of the Web semantic technologies to the Web services discovery and composition [9], [14], [37]. Use of the Web semantic (especially, the ontology) to search and classify the peers contents in P2P networks [18], [19]. Convergence between the systems Multi agents (MAS) and the Peer-to-Peer networks for the composition of the semantic Web services [7], [35]. Use of the formal methods to express the semantic of the specification languages of composite Web services [17], [20]. Evaluation of the performance and the quality of the composite Web services (QoS) [15], [16]. Recently, several research works improve the decentralized discovering methods of Web services in the P2P networks. A number of centralized and P2P Web service discovery methods have been proposed in the context of the Web services composition and Web services based business process management. Among these, [2], [6] and [21] have similar concepts to those which are used in our method. F. Mandreoli et al [2] present the architecture of FLO 2 WER, which is a framework that supports large scale interoperation of semantic Web services in a dynamic and heterogeneous P2P context. They have adopted a hybrid approach that exploits the advantages of centralized registries for service discovery and composition, as well as the dynamism and the scalability of non-structured P2P networks. The main idea of FLO 2 WER framework is that, while decentralizing the knowledge of what specific services are available in the system, they keep centralized knowledge of what objectives may be satisfied within the network, namely Goals. Each Goal specifies therefore a sub network of specific services, and it is stored in an appropriate repository, called Goal Repository. However, it is not described and detailed how the goals have been discovered. Moreover, the use of a central repository of goals is similar to central discovery methods based on Web services functionalities. In contrast, our discovery method combines decentralized and centralized solution. T. Essafi and al [6] present a scalable P2P approach to service discovery using ontology. This work incorporates input/output matching algorithm proposed in paper [23] and extends the solution described in paper [22] by adding an encoding that locates servers in a P2P network to simplify rerouting of query messages. Idem to the precedent work [1], this project adopts only network centralization and hierarchy. However, the main objective of our work is to ensure dynamic Web service discovery. Also, in this work, they still relay on the old DAML-S, which proved to be less flexible than the new OWL-S. Z. Zhengdong and al [21] design a localized mechanism of semantic Web services in CAN-based P2P networks to ensure that each service is registered in a specific node of the public Web. The registration services of public Web nodes are divided by the area and shared by all nodes. This work processes only the CAN-based P2P networks, in contrast, in our work, even we presented an example of the unstructured epidemic algorithm; the proposed solution is generic and adaptable i.e in the discovery step we can apply other type of protocols. VI. CONCLUSION AND FUTURE WORK In this paper, we proposed architecture for semantic Web service discovery and composition in P2P networks. The main goal of this architecture is to ameliorate the scalability, reliability and stability of the semantic Web service composition system, and to improve the ontology-based semantic Web services discovery. This architecture combines the advantages of centralized and decentralized structures. We defined firstly the use of a composition table that provides a purely distributed method to discover the previous P2P composed Web services. The distribution of this table creates a collaborative workspace where each peer can exploit their experiences. Thus, this table permits to preserve the trace of the various successes compositions for a possible future reuse. This characteristic can ameliorate the research time in the network. Also, we proposed to use a single point that contains a knowledge-centered base, which is a main OWL ontology (or more) of the different concepts used in the various domains of Web services. The most important objective of the main OWL ontology is to ensure the homogeneity between the different peers through the use of the same semantic language which is OWL-S. In addition, to improve our solution, we presented a development process that makes easy the use of the proposed architecture and we offered a number of motivate examples and we exploited some open source tools like wsdl2owls and OWL-S MX Matchmaker. Now, we are simulating and testing our solution by using an unstructured P2P protocol, which is Gnutella V0.4 [33]. Moreover, we want to deduct how the reuse of the experiences accelerates the discovery operation. Finally, this work needs to improve some important points. We wish to develop the implemented algorithms using a probabilistic approach to filter the pertinent peers in the network. Also, we hope to improve the QoS of the returned results by proposing some selection criteria of the Web services. REFERENCES 148 IT4OD 2014

155 [1] Gharzouli, M., Boufaida, M.: A Distributed P2P-based architecture for semantic Web services discovery and composition. In: 10 th Annual International Conference on New Technologies of Distributed Systems (NOTERE), pp IEEE, Tozeur, Tunisia (2010). [2] Mandreoli, F., Perdichizzi,A. M., Penzo, W. : A P2P-based Architecture for Semantic Web Service Automatic Composition, In: International Conference on Database and Expert Systems Applications (DEXA), pp IEEE computer Society. Regensburg, Germany (2007). [3] Gharzouli, M., Boufaida, M.: A Generic P2P Collaborative Strategy for Discovering and Composing Semantic Web Services. In: Fourth International Conference on Internet and Web applications and Services(ICIW), pp Venice/ Mestre, Italy (2009). [4] Emekci, F., Sahin, O.D, Agrawal, D., El Abbadi, A.: A Peer-to-Peer Framework for Web Service Discovery with Ranking. In IEEE International Conference on Web Services (ICWS), pp California USA (2004). [5] Sahin, O. D, Gerede, C. E, Agrawal, D., El Abbadi, A., Ibarra,O., Su, J. : SPiDeR: P2P-Based Web Service Discover., In Third International Conference on Service-Oriented Computing (ICSOC), pp , Amsterdam, The Netherlands (2005). [6] ESSAFI, T., DORTA, N., SERET, D.: A Scalable Peer-to-Peer Approach To Service Discovery Using Ontology. In: 9 th World Multiconference on Systemics, Cybernetics and Informatics, Orlando (2005). [7] Kungas, P., Matskin, M.: Semantic Web Service Composition Through a P2P-Based Multi-agent Environment. In: Agent and P2P computing AP2PC. Springer-Verlag Berlin Heidelber, LNAI 4118, pp , (2005). [8] Hu, J., Guo, C., Wang, H., Zou, P.: Web Services Peer-to-Peer Discovery Service for Automated Web Service Composition. In: ICCNMC, Springer-Verlag Berlin Heidelberg (LNCS 3619), pp , Zhangjiajie China (2005). [9] Berardi, D., Calvanese, D., De Giacomo, G., Hull, R., Mecella, M.: Automatic Composition of Transition-based Semantic Web Services with Messaging. In: Very Large Data Bases (VLDB), (2005). [10] Rageb, K.: An Autonomic <K, D>-Interleaving Registry Overlay Network for Efficient Ubiquities Web Services Discovery Service. J. Information Processing Systems (JIPS), 4 (2), pp , (June 2008). [11] Batra, S., Bawa, S.: Review of Machine Learning Approaches to Semantic Web Service Discovery. J. of Advances in Information Technology (JAIT), 1 (3), pp , (August 2010). [12] Schmidt, A., Winterhalter, C.: User Context Aware Delivery of E- Learning Material: Approach and Architecture. J.l of Universal Computer Science (JUCS), 10 (1), (January 2004). [13] Stoica, I., Morris, R., Karger, D., Kaashoek, M.F, Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: ACM SIGCOM, pp , California, USA (2001). [14] Jin, H., Wu, H., Li, Y., Chen, H.: An Approach for Service Discovery based on Semantic Peer-to-Peer. In: ASIAN, pp , Kunming, China, (2005). [15] Vu, L-H., Hauswirth, M., Aberer, K.: Towards P2P-based Semantic Web Service Discovery with QoS Support. In: Business Process Management Workshops, pp.18-31, Nancy, France, (2005). [16] Liu, J., Gu, N., Zong, Y., Ding, Z., Zhang, S., Zhang, Q.: Web Services Automatic Composition Based on QoS. In: IEEE International Conference on e-business Engineering (ICEBE). IEEE Computer Society, (2005). [17] Tang, X., Jiang, C., Ding, Z.: Automatic Web Service Composition Based on Logical Inference of Horn Clauses in Petri Net Models. In:IEEE International Conference on Web Services (ICWS), (2007). [18] Wang, C., Lu, J., Zhang, G.: Generation and Matching of Ontology Data for the Semantic Web in a Peer-to-Peer Framework. In APWeb/WAIM, LNCS 4505, pp , Springer-Verlag Berlin Heidelberg, (2007). [19] Haase, P., Siebes, R., v.harmelen, F.: Peer Selection in Peer-to-Peer Networks with Semantic Topologies. In: ICSNW, pp , Paris, France, (2004). [20] Benmerzoug, D., Gharzouli, M., Boufaida, M.: Formalisation and Verification of Web Services Composition based on BPEL4WS. In: First Workshop of Web services in Information Systems (WWS 09), pp , Algies, Algeria, (2009). [21] Zhengdong, Z., Yahong, H., Ronggui, L., Weiguo, W., Zengzhi, L.: A P2P-Based Semantic Web Services Composition Architecture. In: IEEE International Conference on e-business Engineering, pp , Macau, China (2009). [22] Paolucci, M., Sycara, K., Nishimura, T., Srinivasan, N.: Using DAML-S for P2P Discovery. In:International Conference on Web Services (ICWS), pp , Las Vegas, Nevada, USA (2003). [23] Paolucci, M., Kawamura, T., Payne, T.R., Sycara, K.: Semantic Matching of Web Services Capabilities. In: First International Semantic Web Conference (ISWC), pp , Sardinia, Italy (2002). [24] Web services Architecture, W3C Working group, 2004, [25] Service finder: a search engine for web services discovery, [26] SOAP specification, [27] WSDL specification, [28] OWL-S: Semantic Markup for Web Services, [29] WSMO: Web Service Modeling Ontology, or [30] METEOR-S: Semantic Web services and Process, [31] Semantic Web Central: Project wsdl2owl-s, translator from wsdl to owls, 2004, [32] Hybrid OWL-S Web Service Matchmaker, or [33] The annotated Gnutella protocol specification v0.4, [34] Gnutella protocol Development v0.6, [35] Benmerzoug, D., Gharzouli, M., Zerari, M. Agent Interaction Protocols in Support of Cloud Services Composition proc of Holonic and Multi- Agent Systems for Manufacturing (HoloMAS), 2013, [36] Semantic Web Central: Project sws-tc, [37] Gharzouli, M., Messelem, Y., Bounas, M.H. TTL-CHORD: A Chordbased approach for semantic web services discovery in Scalable Computing: Practice and expereince SCPE vol. 15 N 1, pp (2014). 149 IT4OD 2014

156 Building Web Service Ontology: A Reverse Engineering Approach Houda EL BOUHISSI Department of Computer Sciences EEDIS Laboratory Sidi-Bel-Abbes University, Algeria Mimoun MALKI Department of Computer Sciences EEDIS Laboratory Sidi-Bel-Abbes University, Algeria Abstract This paper addresses the topic of defining a knowledge based system for representing Semantic Web Service ontologies according to the WSMO conceptual model and proposes a software engineering approach to this problem using an existing Web Service. The proposal uses a reverse engineering technique supported by a tool and a similarity measure starting from a WSDL File of an existing Web Service till modeling WSMO Ontology specified in WSML language. Index Terms Web Service, Semantic Web Service, Ontology, WSDL, WSMO, Reverse engineering. I. INTRODUCTION Today, organizations are i ncreasingly forced to modernize using Semantic Web Services which, in recent years, becom e one of the most effective, efficient, and economical means to make intelligent systems. This migration calls for reverse engi neering of W eb Services to Semantic Web services. However, there are few approaches that consider Services ontologies as the target for reverse engi neering. A majority of t he work on reverse engineering has been done on t ools that require a previous knowledge of t he Web Service Application, however few tools uses W SDL (Web Service Description language) File as resource i nformation and WSMO ontology as a target. As an attempt to fill gap in this area, we propose a novel approach t o reverse engineering Web Services to Service s ontology. This paper continuous our previ ous research on domain knowledge driven Web Service Analysis [1] and describes a reverse engi neering process for bui lding Semantic Web Service Ontology according to the WSMO conceptual model. The proposed approach deals with a software engineering technique that consists of ext racting useful information from a W SDL file o f an existing Web Service in order to build Web Service Ontology specified in the WSML. Our approach i s based on the idea that semantics of a Web Service can be i nferred, without an expl icit analysis of the W eb Service code. Rather, these sem antics can be extracted by analyzing WSDL description File, which is the most popular document that describes a W eb Service application. The semantics are suppl emented with the domain ontologies and user head knowledge to build WSMO ontology. Our approach can be applied to migrating Web Service application, which is usually described by the WSDL File to the ontology based Semantic Web. The remainder of this paper is structured as fo llows: Section 2 sum marizes the Sem antic Web Services related works. In section 3, the proposed approach will be described in detail. In Section 4, an experimental test of the proposed t echniques is reported. Finally, section 5 concludes the paper and gi ves future directions of t he on-going project. II. BACKGROUND Semantics can either be added to currently existing syntactic Web Service standards such as UDDI and WSDL, or servi ces can be descri bed using some ontology based descri ption languages. The M ajor initiatives in the area o f SWSs are d ocumented by W3C member submissions, such as, OWL-S [2], WSMO [3] and WSDL-S [4]. Ontology Web Language for Servi ces (OWL-S) semantically describes Web Services using OWL ontologies. The W eb Services Description Language - Semantic (WSDL-S) augments the expressivity of WSDL with semantics such as dom ain ontology in an arbitrary semantic representation language. The W SDL- S proposal was superseded by Semantic Annotations for WSDL (SAWSDL) [5] which is restricted and homogenized version of W SDL-S. The W eb Service Modeling Ontology (WSMO) provides ontological specifications for t he description of Sem antic Web Services. WSMO is th e only standard for which there exist several implementation environments which aim to support the complete standard. For these reasons W SMO is used as our Sem antic Web Services technology throughout the rest of this paper. We will explain next the concepts of the WSMO approach in detail. 150 IT4OD 2014

157 III. PROPOSED APPROACH The main contributions of t he approach present ed in this paper can be summarized as follows: 1. We propose a concept ual modeling approach for t he specification of ont ologies. The approach i s based on WSDL File of an existing Web Service and semantic similarity measures using WordNet; 2. By using the proposed approach, we describe a set of Web based software tools allowing the developer to define WSMO ontologies and t he final user to explore them. A. The WSMO Framework WSMO involves four com ponents, describing semantic aspects of W eb Services: ontologies, Web Services, Goals, and Mediators. Each of these W SMO Top Level Elements can be descri bed with nonfunctional properties. Although this paper i s related to ontologies specification, it briefly describes all the WSMO elements. Ontologies provide a form al and explicit specification of the application domain and of al l the data used by the other components. Optionally, they may be descri bed by nonfunctional properties and may import existing ontologies. However, the WSMO ontology is composed of: 1. Concepts, describing the ontology domain possibly organized in a hierarchy, 2. Relations, representing further connections among concepts, 3. Instances of concept s and relations, setting values to their attributes and parameters respectively, 4. Axioms for further definition of concept s and relations through logical expressions. Furthermore, WSMO comes along with a Web Service Modeling Language Modeling language (WSML 1 ) and a Web Service execution Environment (WSMX 2 ). B. Motivation Among the Graphical User Inte rface tools for building and managing ontologies compliant to WSMO, we distinguish the WSMO-Studio [6] and t he Web Services Modeling Toolkit (WSMT) [7]. However, WSMO Studio is a WSMO editor for building the aspects of Se mantic Web Services, available as E clipsebased plug-ins. Its m ain functionalities include definition of ontologies, Web Services, Goals and M ediators, and services composition through graphical user interfaces. We focus our analysis in th e ontologies editor since our interest is on t he ontologies building. WSMO Studio stores ontologies in WSML, the representation language of WSMO The tool supports interaction with repositories for import/export ontologies, but the editing is done on local copies on the user s machine. At the moment, no ext ensions are off ered to allow concurrent access and editing to a shared repository. Whereas, the Web Services Modeling Toolkit (WSMT) is an Integrated Development Environment (IDE) for Semantic Web Services developed for the Eclipse framework. The WSMT aims at aiding the developers of Semantic Web Services through the WSMO paradigm by providing a seam less set of t ools to improve their productivity. The IDE foc uses on t hree main areas, namely engineering of WSMO descriptions, creation of mediation mappings and interfacing with Semantic Execution Environments and external systems. These tools produce ontologies in a completely manual manner where the user participation is fundamental at all stages of creation. Both tools are based on an earlier knowledge and comprehension of the user, that is, the process involves select choices or introduces useful information. This process is costly and t ime-consuming. In addition, currently, there is n o automatic or semi-automatic tool for creating WSMO ontologies. For these reasons, this paper describes a proposed approach to create a semi-automatic tool that uses the information provided by the WSDL file to create WSMO ontologies. C. Methodology Traditional reverse engi neering tools extract knowledge from source code and s oftware documentation [8]. However, this approach is rather limiting as often information concerning how the code was developed and the rationale for its design are lacking. Moreover, a piece of so urce code may be cryptic due to a l ack of devel oper comments. The prop osed approach in this paper is to use the description of an existing Web Service (WSDL File) itse lf to spe cify ontology according to the WSMO conceptual model. Therefore, this proposal proceeds mainly in two principal stages: A stage of reverse engineering for the identification of the useful information in the WSDL File. A stage of engineering for the construction of the Web Service Ontology according to WSMO conceptual model Our approach uses a W SDL File as i nput, and g oes through five basic step s: (1) Entities identification to extract useful information from a WSDL File, (2 ) analyzing the extracted information by applying mapping rules to create the backbone of the ontology, (3) Sem antic enhancement of t he useful information using domain ontologies for applicability and consideration of the standardization problem, (4) Building ontology by translating the formal model into ontology element specified in the WSML language, and (5) Validation of t he produced element. The proposed approach reduces the efforts and the cost to build a W SMO Ontology by reengineering, without paying attention to the source code or how the application has been built. The proposal is divided into the following five high-level 151 IT4OD 2014

158 phases of building the Semantic Web Service ontology process, see Figure 1: a) Phase 1: Entities identification. This phase deals with the information required to create t he WSMO ontology (concepts, attributes, relationships, and axioms) starting from an existing WSDL File. In this Phase, we are interested to the XML Schema part of the WSDL File which is a description of data types of the Web Service s input and o utput. In this schema, we can find the definition of ComplexTypes, SimpleTypes and the declaration of elements. However, their explanations are as the following: Simple Type Definition: XML Schema provides a wide range of bu ilt-in data types, for ex ample, string, integer, Boolean, float, decimal, etc, these are examples of one form of simple type definitions. Another form of simple type definitions can be used to provide constraints on the values of built-in types. For example, it may be necessary to restrict the allowed values of th e positiveinteger data type to a particular maximum value. Complex Type Definition: Can be used to: define a data type composed of s ub-elements of ot her data types; define the allowed structure of child elements using the keywords all, sequence and choice; or extend or restrict the definition of an existing complex type. Additionally, the values of elements can be accompanied by constraints on their values. Attribute Declarations: Attributes can be ei ther global or associated with a pa rticular complex type definition. Attributes are an association between a name and a sim ple type definition. Restrictions (also called facets) are used to define acceptable values for XML elements or attributes. We use the restriction element to indicate the existing (base) type, and to identify the 'facets' that constrain the range of values. The extraction phase is a fu ll automatic process, we identify the information between the <element> and </element> according to C omplextype, Simple type, and Attribute definition. Also we enumerate all the restriction statement options. These information concern name, type, and restriction attribute. All the extracted information is stored in a XML file for further use. a) Phase 2: Analysis Phase. This phase focuses on the mapping of retrieved information in the previous phase using mapping rules. The m apping engine deals with a set of transformation rules and analyzes the information. This phase produces a set of concept s and axioms. Definition of each concept is accom panied by his sub concept, its components with their attributes name and t ype. The m apping produced i s roughly based on the following rules (see Figure 2): Fig. 1. System Architecture Fig. 2. Mapping Rules Definition Rule 01: Simple type definition. If a simple type is used to create a new type based, we create a ne w concept with the same built-in type (see figure 3). Also, if the simple type participates in th e definition of a co mplex type, it will be mapped to a property (attribute) of the complex type with the main built-in type. <xsd :element name= age > <xsd :simpletype> <xsd :restriction base= xsd :positiveinteger > <xsd :maxexclusive value= 35 > </xsd :restriction> </xsd :simpletype> </xsd :element> Fig. 3. Example for the Rule 01 Rule 02: Complex type definition. Complex type definitions can obtain sub-components that are a m ixture of simple elements, attributes and other complex type definitions. We propose to m ap each c omplex type to a conce pt in the WSMO Ontology. Sub-components with simple type builtin are mapped to attributes with the same built-in type and attributes are mapped to attrib utes with the same built-in type. If the sub-component is itself a complex type, here we 152 IT4OD 2014

159 proceed in a recursive manne r, we create first the corresponding concept, and then the sub-components are mapped to attributes with th e build-in type (see figure 4) which contains a definition assumed to be within the XML Schema part of the WSDL File. Fin ally, the complex type which is embedded in another complex type is mapped at one hand to sub-concept of the complex type and at the other hand to a concept. <xs:element name= Customer > <xs:complextype> <xs:sequence> <xs:element name= Dob type= xs:date > <xs:element name= Address type= xs:string /> </xs:sequence> </xs:complextype> </xs:element> Fig. 4. Example for the Rule 02 Rule 3: Attributes. An attribute may be associated directly to the root or embedded in a simple or complex type. If an attribute depends to the root, we propose to create a new concept with the built-in type. If the attribute is embedded in a simple or a complex type, it is mapped to an attribute of the concept of the complex or the simple type. Rule 4: Restriction element for data type. Each restriction is mapped to an axiom, with the corresponding option. As an Example of restriction definition, in figure 3, the CountryConstraint which defines the corresponding value of the Country. The results of the analysis phase are stored in a XML file which is mainly designed by the tags <complextype>... </complextype> and <sim pletype>... </ simpletype >. This formalization facilitates better the translation into the WSML language. b) Phase 3: Semantic Enhancement. In the process of identifying entities in the document, it is possible that we find values for attributes or relationships that were not previously present in the k nowledge base. Enhancing the existing metadata could be as simple as entering values for attributes, in which case th ey could be au tomated; or as co mplex as modifying the underlying schema, in which case some user involvement might be required. The third phase of our proposed is the computation of the semantic similarity measure which aims to quantify how much two concepts from the XML file produced before and domain ontology are similar. The similarity analysis of these concepts is used through WordNet dictionary which will give a st andardized and complete synonym set and cl assifies the entities. We use WordNet 3 based similarity measures [9] such as Path, Resnik, 3 WordNet is an online lexical database designed for use under program control. English nouns, verbs, adjectives, and adver bs are organized into sets of sy nonyms, each representing a lexicalized concept (See Lin and Jiang. WordNet is an online lexical database designed for use in a program. For this reason, we use the WordNet as a taxonomic reference, so the idea is to compare the concept of the XML file already created with domain ontology, using the WordNet. Figure 5 depi cts the algorithm of the semantic enhancement process. We have an XM L file corresponding to the data types of t he Web Service expressed, i n concept, attribute, restriction and type. We are interested to only the concepts that have not sub concept, as concept candidates because we assum e that is enough to identify if the produced WSMO ontology is well defined. We also have an uploaded an ont ology containing concepts according to any domain and each concept may possibly have a set of at tributes. Subsequently, we compute similarity measures to identify the semantic concepts of t he ontology. We define a t hreshold value for prior semantic similarity measure. The threshold is a value between 0 and 1, the value 1 indicating that the two entities are completely similar. The threshold value is an important decision point. If it is very low, between [0.0, 0.5], many concepts can be wrongly m atched i.e. false positives can be returned by function similarity (c1, c2). On t he other hand, i f this value is high, between [0.7, 1.0], many concepts which might be matched are not caught, i.e. many false negatives can be discarded. Second, we must choose a m ethod to compute the semantic similarity measure as described above. If the semantic similarity measure between the concept of the XML file and the ontology concept is greater than or equal the threshold, we can consider that the concept belongs to the domain of the ontology concept, therefore this concept can be superseded by the concept of the of t he domain ontology and at last its sub concept is retrieved for adding in the XML file. Algorithm 1: Semantic enhancement Input: XML File (i Candidates concepts) Concepts of Domain Ontology (j Candidates concepts) A Similarity measure Threshold value for the similarity measure Output: List of concepts Begin Create a vector containing the concepts of domain ontology (j concepts) For each concept C k of the XML File (k=1 to i do) do For each concept CO m m=1 of the domain ontology (m=1 to j) do Calculates the distance between C k and CO m If (the similarity = the threshold) then supersedes the concept c k by the concept CO m EndIf EndFor EndFor End. Fig. 5. Algorithm 1 : Semantic enhancement 153 IT4OD 2014

160 c) Phase 4: Building Ontology. Ontologies and Semantic Web Services need formal languages for their specification in order to enable automated processing. As for ontology descriptions, the W3C recommendation for an ontology language OWL has limitations both on a conceptual level and with respect to some of its formal properties. One proposal for the description of Semantic Web Services is WSML which is the formal language of the WSMO. The WSML ontology is created on the fly according to the content of the enhanced XML file. The W SML Ontology mainly consists of: WSML variant, Namespace, A set of non functional properties, Concepts and Axioms. Concepts and axiom s are retrieved stately from XML schema. Table 1 depicts the main transformation rules from XML to WSML language. libraries that ease the retrieval and parsing of web pages and the construction of ontologies. This tool represents the first step of our efforts in developing a general translation engine for bui lding Semantic Web Service ontology according to the WSMO conceptual model. The tool is called BUOWES (Building Ontology for Web Service), which is a se mi automatic tool for the translation of the WSDL file to an incomplete WSMO ontology. TABLE I. XML to WSML Mapping Rules XML Element Complex_Type Constructor Concept Constructor Attribute Constructor Restriction constructor Enumetare constructor Attribute of SimpleConcept WSLML Element Concept Concept Concept axiom Or operator attribute Datatype Figure 6 presents the translation of the entity Customer of the figure 4. Each attribute is j oined to its corresponding value-type by the constructor OfType. Concept Customer Dob OfType date Address OfType string Fig. 6. Example of mapping concept to WSML language However, the non-functional properties are introduced semi-automatically by the user from a list which is defined previously. d) Phase 5: Validation. All previous steps may introduce wrong concepts and relationships, thus an aut omated validation step is needed. This step is often done by hand. Before registering the ontology, it should be validated to ensure WSML correctness and accuracy. If the ontology is not suitable to the WSML language, a domain expert could bring some alterations to correct it. Once the ontology is validated, it will be stored in the. IV. EXPERIMENTATION In order to validate and evaluate our approach, a software tool has been creat ed. The execution engine of this tool has been fully implemented in Java with the NetBeans IDE because there is a large am ount of Fig. 7. A screenshot of a the proposed tool A screen-shot of the BUOWES s user interface is presented in figure 7. B UOWES is a soft ware tool which takes as an i nput the WSDL specification and dom ain ontology and ret urns WSMO ontology description specified in the WSML language as an output. BUOWES software tool provides a friendly user interface and sim ple to utilize. You can either upload a WSDL file by its URL or by browsing locally. Then, t he wrapper module parses the WSDL file and extracts the XSD (XML Schema Definition) defined between the WSDL type tags. The mapping engine converts the XSDs ext racted to terms used by the ontology, according to a set of mapping rules. The m apping engine produces a l ist of terms which will b e stored in an XML file to be used next. Candidates entities of the XML file could be superseded with other ones ret rieved by domain ontology by make use of the similarity measure. The user participation is fundamental and m ust be promoted to upload the appropriate domain ontology. The building m odule translates the entities into WSML specification and generates the WSMO ontology according to the W SML language. The user can visualize the Ontology as a tree or a textual file. Finally, the validator module checks t he produced ontology to ensure WSML correctness and accuracy. Non functional properties are added by the user. The user m ay choose the relevant non functional properties from a list of choices (see figure 10) We perform our t ool on a dat a set of 20 W SDL Web services of di fferent domains showing the impact of the proposed tool to decrease the time and the effort of the building process. 154 IT4OD 2014

161 Figure 8 depi cts an exam ple of a generat ed ontology for e-tourism domain. experimentation are q uite promising, and we will continue to develop and evaluate this process. ACKNOWLEDGMENTS We would like to thank the anonymous reviewers for their valuable comments. Fig. 8. Building Ontology according to e-tourism Web Service V. CONCLUSION In this paper, we have proposed a novel approach for building Semantic Web Service ontology according to the WSMO conceptual model. The proposed t echnique uses a reverse engi neering process and WSDL file of existing Web Service. Our approach is composed by two main aspects: reverse engineering the WSDL file and engi neering the WSMO Ontology specified in the WSML language and is based on the idea that semantics of a Web Service can be extracted by analyzing W SDL File. These sem antics are supplemented with the domain ontologies and user head knowledge to build the ontology. Finally, we have i mplemented a set of end user t ools based on web interfaces for: The insertion of WSDL Files, Calculating the similarity measure, Browsing the ontological repository by the final user and producing WSML Ontology. The strong point of the proposed approach is that it relies on a WSDL File of an existing Web Service. The use of dom ain ontologies improves the semantic interoperability. The work report ed in this paper is a part of large project, and actually, the work still in progress aimed at extending our approach t o build Web Services, Goals and Mediators. We believe that the results of our initial REFERENCES [1] H. EL BOUHISSI and M. Malki. Reverse Engineering Existing Web Service Applications", In Proc. Of 16th Working Conference on reverse Engineering,WCRE 09, Published by the IEEE Computer Society, pp , ISSN Number , ISBN October 13-16, 2009, Lille, France. [2] D. Martin, M. Burstein, J. O. Hobbs, D. Lassila, S. McDermott, S. McIlraith, M. Narayanan, B. Paolucci, T. Parsia, E. Pay ne, Sirin, N. Srinivasan & K. Sycara (2004). OWL Web Ontology Language for Services (OWL-S), W3C Member Submission. [3] J. De Bruijn, C. Bussler, J. Do mingue, D. Fensel, M. Hepp, U. Keller, M. Kifer, B. König-Ries, J. Kopecky, R. Lara, H. Lausen, E. Oren, A. P olleres, D. Rom an, J. Scicluna, & M.,Stollberg (2005). Web Service Modeling Ontolo gy (WSMO). [4] R. E. Akkiraju, J. Farrell, J. Miller, M. Nagarajan, M. Schmidt, A. Sheth & K. Verma, (2005). Web Service Semantics - WSDL- S, W3C Member Submission. [5] J. Farrell & H. Lausen (2007). Semantic Annotations for WSDL and XML Schema. W3C Candidate Recommendation, Retrieved from [6] M. Dimitrov, A. Simov & D. Ognyanov (2005). WSMO Studio An Integrated Service Environment for WSMO. In Proceedings of the Workshop on WSMO Implementations Innsbruck, Austria, June 6-7. [7] M. Kerrigan & A. Mocan (2008). The Web Service Modeling Toolkit. In P roceedings of the 5th European S emantic Web Conference (ESWC'08), Tenerife, Canary Islands, Spain, June 1-5, pp [8] W. Xiaomin, A. Murray, Storey, M.-A., & R. Lintern (2004). A Reverse Engineering Appr oach to Support Software Maintenance: Version Control Knowledge Extr action. In Proceedings of the 11th W orking Conference on Revers e Engineering (WCRE 04), Washington, DC, USA, pp [9] L. Meng, R. Huang, and J. Gu (2013). A Review of Semantic Similarity Measures in W ordnet. International Journal of Hybrid Information Technology. Vol. 6, No IT4OD 2014

162 Semantic Social-Based Adaptation Approach for Multimedia Documents in P2P Architecture Adel ALTI, Mourad SAADAOUI LRSD University of Setif-1, Setif - Algeria Makhlouf DERDOUR LRSD University of Tebessa, Tebessa - Algeria Abstract This paper presents an approach to enhance users experience through the use of recommendations and social networks for on-the-fly (at runtime) adaptation of multimedia documents. The originality of the dedicated social and customization of quality adaptation paths is that relies on both user s Advanced Semantic Generic Profile and his preferences, inferred social influence from a Facebook as unstructured P2P environment. The proposed approach has been validated through a prototype for the authors of oral conference presentation The goal is to improve assembly of relevant adaptation services and the efficiency and effectiveness of our approach. Index Terms Context-aware, user profile, services composition, QoE, social computing. I. INTRODUCTION Semantic Social Adaptation Platform (SSAP) is a platform for context-aware social networking of mobile users and as such requires advanced semantic user profiles in order to provide personalization of multimedia content through new services and customization of service qualities. The consideration of advanced semantic generic user profiles masks the heterogeneity of user s profiles and service that can come from social networks. We focus especially to information and broadcasting information and interactive services related to the field of smart (home, health, cities, etc.). This raises, among other problems related to the heterogeneity of devices regarding CPU power, communication mechanisms (GPRS, WIFI, Bluetooth, etc.), speed of transmission as well as the media variety (sound, video, text and image). The main visible adaptation approaches exist for multimedia documents are [1]: (1) server-side adaptation (e.g., [1]), (2) proxy-based adaptation (e.g., [4]), (3) client-side adaptation (e.g., [6]) and (4) Peer-to-Peer (P2P) adaptation (e.g., [10]). The heterogeneity of such applications and the diversity of user needs may prevent from playing specific multimedia contents. In classical approaches [1, 4, 6, 10], the adaptation process supposes the all adaptation services are already available on each mobile platform. Such approaches own the following characteristics: Predefined quality of Service. Adaptation approaches have defined different quality of service (QoS) properties. For instance, some of them minimize the computation cost for producing an adapted document, while others may maximize the proximity between the original and the adapted content. Usually, the quality of service of an adaptation framework is not customizable, in other words the quality of the adaptation process is usually predefined on a fixed set of properties. Specific type of context. Current approaches are often focused on specific type of context. It is clear that the other users (in the mean of the personality and relationships) influence users that experience the services within a group. In this paper, we exploit the vision and challenges of peers cooperating to create SSAT. We propose to use the neighborhood opinion about the service quality in order to provide an adapted solution. We propose to use this neighborhood concept because a user may have some experienced access to other specific adaptation services. Moreover, it reduces the search space of remote adaptation services Our approach follows a layered architecture. It is composed of three layers: the peer-to-peer layer offers reconfigurationlevel services required by any P2P multimedia adaptation applications: like service-oriented reconfiguration, dynamic cooperation among groups and communication between mobile platforms, a contextual-deployment strategy. The Kalimucho P2P platform proposed by [7] implements this layer. The generic adaptation core layer provides semantic abstraction to hide the complexity and the heterogeneity of the underlying P2P medium and implements the following functionalities: (1) - semantic relevance determination facilities to discover adaptation services, (2) - flexibility for quality assembly of heterogeneous multimedia services. The rest of this paper is organized as follows: Section 2 overviews the main adaptation approaches. Section 3 describes our adaptation architecture. Afterwards, results of an experimental evaluation of semantic social service discovery are presented in Section 4. Finally, Section 5 concludes the paper and presents ideas for future works. 156 IT4OD 2014

163 II. RELATED WORKS Since the last decade, a lot of research has been proposed and usually grouped in four main categories: Server-side adaptation [4]: some devices may request a server to adapt some multimedia documents. Such adaptation is under its responsibility and could require advanced knowledge and skills about the connected users. In such a situation, the server usually becomes quickly tedious, overloaded and time consuming. Client-side adaptation [6]: each device may be able to adapt documents by itself. However, some of clients may not be able to execute multiple adaptation operations at a time due to their limited capacities, e.g., battery. Proxy-based adaptation [1]: a proxy is between a client and a server, and acts as a mediator. In this case, many communications may be done, since it negotiates the adaptation to apply. Peer-to-Peer adaptation [10]: the arrival of peer-to-peer technology strongly contributes to change the adaptation architectures. Multimedia contents, exchanges conflicts and services are indifferently distributed among mobile platforms. Adaptation platforms play now a double role: adaptation service providers and adaptation service requesters. The distributed approach fits better with the different characteristics of the heterogeneous mobiles platforms. Consequently, this approach takes all the advantages of the peer-to-peer paradigm load balancing, and more service choices. TABLE I. MULTIMEDIA DOCUMENT ADAPTATION ARCHITECTURES Architecture Type Centralized Hybrid Decentralized Adaptation management Centralized Centralized Decentralized Adaptation service distribution Client-side Service-side Peer-to-Peer Peer-to-Peer We propose to use semantic and social information to enhance efficiency and flexibility of composition of heterogeneous adaptation components and enables applications to our adaptation platform to better guide the adaptation process. III. ENHANCING QOE IN QUALITY MODELS AND ADAPTATION PROCESSES The general structure of Adaptation Service Quality Ontology (ASQ) is presented in Fig. 1 [9]. The proposed structure consists in four parts representing multimedia adaptation service, multimedia adaptation process, Quality of Service (QoS) of adaptation process, context user model. The ontology presented here in aims at providing a common model to the heterogeneous technologies usually available in pervasive environments, to share their execution devices properties and constraints, to provide semantic service determination facilities and to provide a better solution for quality assembly of heterogeneous multimedia adaptation services fitting as much as possible the device constraints and user preferences. We extend the ontology ASQ in [9] by adding Quality of Experience (QoE) metrics of interest to the requester, social network integration as new basic elements of the ontology. We consider that by incorporating these concepts to ASQ we may be able to provide a m ethod to specify adaptation rules over heterogeneous platforms (smartphone, tablet, laptop etc.). The heterogeneity is on both data format, contents and communication interfaces (Bluetooth, WiFi, Zigbee, etc.) A. Advanced Semantic Generic Profile The Advanced Semantic Generic Profile (ASGP) is a user profile used throughout the Semantic Social-Opinion Platform. An overview of ASGP is given in Fig. 2.Top to bottom ASGP is comprised of three parts aligned with available user data sources, i.e. document, social and context data sources. Left to right ASGP is comprised of the parts holding the processed user data itself. Context Profile (left) holds information that have been acquired, aggregated from available user data sources. Inferred Profile (right) holds knowledge about users as a result of reasoning upon Context Profile. In smart environments, Context Profile contains multimedia services that could provide or process any kind of information: The user facet: This facet contains information about a person. A wide variety of heterogeneous sources could provide such kind of information. Of course, the data sources could be stored locally or remotely (e.g., medical records, relatives or friends, visited places, agendas...) thus, the user facet integrates and/or refers to such data. Of course, some parts of the user profile may be public, while other parts may be private (i.e., restricted access). The device facet: The device facet describes all the capabilities of a smart environment in terms of hardware. For instance, it details which kind of equipment are available, such as sensors and fixed or handheld devices. The social facet: The social facet describes user's friend list retrieved from social network networks and social exchanges such as Twiter, FaceBook and LinkedIn, etc. Our context model refers to services, we can migrate the context from platforms to platforms by preserving the constraints. Another advantage of this context model is that we do not need to update the model at any time, only specific services may be called for providing the only needed information. Inferred Profile contains knowledge about different User attributes, such as Influence, trust, experience, e.g. attribute experience contains knowledge about social influence, which is inferred by reasoning upon Social facets of the Context Profile. Note that Context Profile and Inferred Profile are the only sections of ASGP that contain actual entries to be stored in a database. 157 IT4OD 2014

164 Fig. 1 The core of Adaptation Service Quality Services ontology [9]. 1..* use Inferred-Profil Attribute-Name Value Influenced By Depand 1..* Context-Profil 1..* AttributName Vote-Value Profil First-Name Last-Name ID-account Class 1..* Type ComposedOf 1..* Has_service Facet User Social Fig. 2 Advanced Semantic Generic Profil Metamodel. Device Analysis and reasoning hereby defines a step in which context user information is reasoned upon in order to new knowledge about the way the user perceived the adaptation service efficiency and quality. This step yields new knowledge of user in one of two possible ways: reasoning upon a single user through monitoring certain information in time domain or reasoning upon multiple users, e.g. calculating user s influence towards other users. B. Social Network and QoE Integrations In order for a user to interact with other neighbors users, it has to use the interface that they expose. For this purpose, we developed the Semantic Social Context-aware Platform, which provides an easy mechanism to build social exchanges. It offers an API that enables user authentication and sharing votes 1 Service AttributName Value Time. through different platforms while hiding all the intricacies of generating signatures and tokens, doing security handshakes, etc. The extended functionality is implemented for Facebook and Twitter. This functionality is used by the user mobile application. In addition, the Semantic Social Context-aware Platform includes a sub-part responsible for retrieving data of interest from the social networks during the service discovery and using them to calculate Quality of Experience (QoE) metrics of interest to the requestor. The evaluation of QoS is defined through an evaluation of the adaptation service (Cost) and the evaluation of output quality (Benefice). The user is able to customize the quality parameters in order to fulfill his/her needs. For example, one user may want a high quality image, while another one may want an energy saving adaptation with an average quality of media. A service-scoring maximum shall be the best available service while a service scoring less than a specified threshold shall be ignored and similarly, a service gaining a zero weight shall be straightaway discarded. Benefit α Beneficemedia _ type, media _ characteristics ( S, i) (1) V ( S, i) = QoS( S, i) = Cost β Cost ( S, i) energy, load, memory, time where α and β are coefficients that allow the user to specify the importance of each quality criteria. The quality function is customizable and parametrizable according to different needs of users and context parameters. Cost is parametrizable according to context parameters like CPU load, energy saving, low bandwidth, etc. and the benefice is parametrizable by specific media parameters like compression ratio, frame rate, resolution, etc. Each peer broadcasts the votes attributed to its neighbors. After receiving the votes, each node determines the service weight as follows: 158 IT4OD 2014

165 Score = ϕ ( 1 ϕ) V ( S, ) (2) s, i VS i k, i + S k N i Where V ( S, i) refers to the vote given by the user s i users influenced by other users votes. The user s weight history is considered as the ϕ and the actual user context as the1 ϕ. This metric is widely used in the service of similar user s adaptation task, while it balances the computation cost and the performance. Each node broadcasts its weight to its neighbors. During the service discovery phase, the nodes use these weights to select relevant quality adaptation service. The advantage of this technique of vote lies in the fact that the node itself and all its neighbors instead of using the local properties alone must determine the importance of a node. More votes a node accumulates, the more important it is in the entire network. Of course, the discovery module evaluates dynamically the profile (the user preferences and its usage context) and consequently may update continuously the adaptation service lists. The QoE function is used for selecting the best adaptation path that has a higher ratio Benefice/Cost. In this case the values of a quality formula are used for classifying the relevant adaptation paths that have potential benefit. The evaluation of the adaptation paths having the same mark of benefice that maximize users' qualities (expressed as preferences). That will only modify the mark of the adaptive criterion (response time, adaptation effort, etc.). So, reasoning is specified by analyzing finite sets of adaptation paths having the same mark of benefice metric and differs only by their adaptability cost to the context. In our approach, an information exchange between nodes in the P2P network defined by Fig.4. Fig. 3 QoE modeling. IV. SEMANTIC-SOCIAL ADAPTATION PLATFORM The arrival of P2P paradigm strongly contributes to change the adaptation architectures [10]. Kalimucho is a technological platform [7] that supports P2P facilities to build a new generation of mobile applications, services that improve interoperability between heterogeneous mobile devices. Fig. 4 Information exchanges in P2P network. Our work will benefit from the use of the Kalimucho platform [7] for (re)-deploying and supervising reconfigurable distributed adaptation components. This section focuses about the implementation of our generic peer-to-peer semantic context-aware adaptation platform. A. General architecture The proposed platform is built according to a l ayered architecture (Fig.5). It is composed of three layers: The Kalimucho layer offers service-level functions: (re) deployment and reconfiguration strategy according to its system (Android, Laptop, Fixe, and CDC.etc.) with QoS requirements, management of groups, dynamic supervision of adaptation components, and communication protocols between mobiles nodes. The generic semantic adaptation core layer provides abstractions to hide the complexity and the heterogeneity of the underlying service-based P2P principle and implements a semantic social-based service assembly strategy for each user according its context, which is inferred automatically from user profiles and inferences rules based on ASQ ontology [2]. B. Enhancing expressiveness and relevance in adaptation processes In our layered architecture, it appears that through the four semantic P2P access, resource manager, profile manager, conflicts manager and adaptation manager roles implemented by four interlinked modules, the nodes own adaptation capabilities and cooperatively realize the adaptation process. Semantic P2P access and social exchanges. This module supplies common P2P access at a high level while hiding the complexity and the heterogeneity of the associated low-level operations. Semantic resource manager. Each peer node stores several adaptation services, multimedia contents on the local file system. Peers would instantiate classes from the ASQ ontology [2] and publish the resulting individual results as OWL files on their websites. This 159 IT4OD 2014

166 ontology defines common concepts as well as relationships among those concepts to represent multimedia data, service parameters (compression ratio, resolution, color number, etc.), semantic service information (service role, media type, action type, version, resources needs, QoS parameters, semantics of input/output, location and time) and context constraints. Peer General Architecture Advanced Semantic generic profile SPARQL SPARQL P2P access and social exchanges List of Neighbors Selection of a Peer Adaptation Manager Discovery module Composition Module Fig. 5. The architecture Peers List Adaptation Decision Module Context in ASQ Local Repository Applications SPARQL ASQ Ontology Request Local Services Environnement Semantic generic profile manager. This module manages user profile and user experiments as well. We have used the ASGP vocabulary in order to facilitate the description of constraints between different profile information. The monitoring of the QoE and QoS metrics generated by the Semantic Social Contextaware Platform and the P2P is managed by the Experiment Profile Monitoring (EPM) component. Within the EPM, the collected data is stored according to a metric data model into a user profile. Semantic social-based adaptation manager. There are three sub-modules in this component - Semantic-social services discovery module. This module discovers some services that will match the semantic requirements of the user: inputs/outputs, action types (transcoding, transmoding, transforming image resizing, language translation,etc.), media type (image, video, sound, text), a specific service context information (user location, user language, user age, screen resolution, battery level, memory size, etc.). After the discovery will find the matched services, it sends them to the automatic adaptation plan generation component. - Automatic adaptation plan generation module. From the set of services discovered by the discovery module and the adaptation guide, it generates possible chains of services. In order to establish the sequence of services, our ASQ ontology is used. This ontology provides the correspondences between the roles and the service semantic information. - Adaptation decision module. This module is responsible of selecting the best path after calculating the score of each adaptation path. The comparison of score adaptation paths allows us to select the best path. In this case the values of a quality formula are used for classifying the relevant adaptation paths that have potential benefit. The main objective of our approach is to improve the efficiency and accuracy. This objective is achieved through finite sets of semantic relevant adaptation services, well consideration of user s opinions and various users contexts. In the next section, we will experiment these enhancements. V. EXEMPLES OF SCENARIOS AND VALIDATION A. Examples of Scenarios We have evaluated our approach which includes quality service semantic level and social service level on local and remote configurations. Several scenarios are experimented. A first simple one is made of a User B, who wants to share oral conference presentation using a Samsung Smartphone with user A. There is a s ingle adaptation service that can semantically resolve resolution conflicts using a colored picture. Thus, no complex adaptation task is required. A second scenario is when the adapted document is ready to be executed. User B receives a phone call to join his colleagues at work. A notification is sent to the Adaptation Manager telling it that there is not enough available bandwidth to continue displaying such a video. SSAT calculates scores between social network users with Eq.2. We get top potential adaptation service that resolves problem of bandwidth in Fig.5. The evaluation results, meaning that adaptation path Resize MP4 WAV BMP PDF is selected as the best path. This selection based on the high score compared to other paths under low bandwidth. B. Evaluation and Discussion Compared with [1], the computation times of our approach increase slowly than [1] when the service repository increases. This result is practically significant related to the two aspects: 160 IT4OD 2014

167 Service Discovery and Selection TransformingResizeService Score=2.0 TransformingResizeService Score=1.5 TranscodingServiceImage.BmpToImage.Jpeg Score=2.0 TranscodingServiceImage.BmpToImage.Jpeg Score=1.5 TranscodingServiceImage.BmpToImage.Jpeg Score=2.25 TranscodingServiceSound.Mp3ToSound.Wav Score=1.25 TranscodingServiceSound.Mp3ToSound.Wav Score=2.25 TransmodingServiceVideo.AviToSound.Wav Score=2.0 TransmodingServiceVideo.AviToSound.Wav Score=1.5 other services TransformingResizeService Score=2.0 TranscodingServiceImage.BmpToImage.Jpeg Score=2.0 TranscodingServiceImage.BmpToImage.Jpeg Score=2.25 TransmodingServiceVideo.AviToSound.Wav Score=1.5 other services (1) the QoS for balancing the benefits of output quality and the revenue of adaptation cost guided by semantic services relationships (2) our proposed context-based social modeling reduce the composition time when only reduced relevant services set is early selected. Fig. 6 Response time under various adaptation approaches. VI. CONCLUSION SSAT is a semantic social platform that helps users access to multimedia documents and organize social exchanges between users. This platform follows a l ayered architecture and is based on a decentralized semantic peer-to peer model. In this paper, we discuss issues related to the multimedia adaptation topics. The semantic-social information plays a central role in discovery, dynamic selection, dynamic composition and substitution of services. The experiment shows that our tool outperforms adaptation ratio and time response. In the future, we plan to deploy SSAT in the software platform Kalimucho for deploying reconfigurable distributed applications. Fig.5 Experience semantic-based application and Evaluation Results. REFERENCES [1] Dromzée C., Laborie S., and Roose P., A Semantic Generic Profile for Multimedia Documents Adaptation, in Intelligent Multimedia Technologies for Networking Applications: Techniques and Tools. IGI Global, 2013, pp [2] Alti A, Laborie S, Roose P, Automatic Adaptation of Multimedia Documents. International Symposium on F rontiers in Ambient and Mobile Systems (FAMS-2013) in conjunction with the 4th International Conference on A mbient Systems, Networks and Technologies (ANT-2013), pp [3] D. Jannach and K. Leopold. Knowledge-based multimediaadaptation for ubiquitous multimedia consumption. Journal of Network and Computer Applications 2007; 30(3): [4] T. Lemlouma and N. Layaïda. Content Interaction and Formatting for Mobile Devices. In Proc. of the 2005 ACMSymposium on D ocument Engineering (DocEng 05), pp , ACM Press, November 2005 [5] H. Ahmadi and J. Kong. Efficient web browsing on s mall screens. In Proc. of the working conference on A dvanced visualinterfaces (AVI '08), pp , ACM Press, May 2008 [6] M. Dalmau and P. Roose, Kalimucho : Plateforme logicielle distribuée de supervision d applications, the 7ème conférence francophone sur lesarchitectures logicielles, Toulouse, France. [7] Klyne G., Reynolds F., Woodrow C., Ohto H., Hjelm J., Butler M. H., and Tran L., Composite Capability/Preference Profiles (CC/PP): Structure and Vocabularies 1.0. W3C, Recommendation, January 2004.[Online]. Available: [8] Ontology Editor, [9] Hai Q.P, Laborie S, and Roose P. On-the-fly Multimedia Document Adaptation Architecture, International Workshop onservice Discovery and Composition in Ubiquitous and Pervasive Environments (SUPE), 2012; 10 : [10] Hai Q.P, Laborie S, and Roose P. On-the-fly Multimedia Document Adaptation Architecture, International Workshop onservice Discovery and Composition in Ubiquitous and Pervasive Environments (SUPE), 2012; 10 : IT4OD 2014

168 ONTOLOGY DRIVEN GRAPH MATCHING APPROACH FOR AUTOMATIC LABELING BRAIN CORTICAL SULCI #1 Belaggoune Mohamed, Saliha, Nadjia Benblidia Oukid#2 #3 LRDSI Laboratory, Computer Science Department, SaadDahlab Blida University Blida, Alegria # ABSTRACT The cortical surface in human brain has complex structure comprised of folds (gyri) and fissures (sulci). A precise recognition and labeling of cortical sulci in MRI images is helpful in many human brain studies and application relating to brain anatomy and function. Due to this structural complexity and inter-subject variability, this is considered as a non-trivial task. Thus recognition cortical sulci requires multiple knowledge concerning the sulci and spatial relation. Considering the limitations of both approach based on low descriptors and approach based on high-level descriptors, we propose a hybrid approach coupling high-level knowledge with low level knowledge in a graph matching framework based on local search algorithm in order to benefit from the advantages and to overcome the limitations of both categories. Keywords MRI images, cerebral cortex, sulci labeling, graph matching, ontology. i. INTRODUCTION Recently a considerable progress in medical imaging techniques such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET) andmagnetoencephalography (MEG) [1] especially in neuroimaging has contributed a lot both in medical domain and in neurology field. MRI is a medical imaging technology which allows cross sectional view of the body with unprecedented tissue contrast and provides a digital representation of tissue characteristic that can be obtained in any tissue plane [2]. Many computer applications introduced to facilitate image processing and analysis in particular, for the delineation or recognition of anatomical structures and other regions of interest. The external surface of brain is a thin layer composed of gray matter called cerebral cortex, it is highly convoluted[3].it is made up of many convoluted folds called gyri separated by spaces known as sulci. Sulci serve as important macroscopic landmarks to distinguish different functional areas of the brain and gyri serve as structural unit of cortex. Recognition of cortical sulci in MRI images is important to localize activation sites in functional imaging to describe morphological changes of brains affected by diseases in neurosurgical planning because,it is helpful in educating biologists and clinicians, see Figure 1 [4][5][6]. Recognition of cortical sulci in MRI images is a hard problem which requires anatomical expertise and it is a time consuming because these structures have very complex geometry and variation in their shape from an individual to another. Figure 2 illustrates well variability inter-subject and variation between right and left hemisphere in the same subjectand thus a scheme for automatic recognition and labeling is necessary. Various approaches have been proposed to tackle this challenging problem.each approach is based on some a priori knowledge generally low level descriptorsabout cortical sulciand recognition strategy. The large variation of cortical foldingpatterns makes automatic labeling a challenging problem, which cannot be solved by numeric knowledge (low level descriptors) alone. Therefore multiple knowledge concerning the sulci are required. Graph is a powerful tool for pattern representation and recognitionin various fields. The primary advantage ofgraph-based representation is that it can represent patterns and relationshipsamong data easily. In this work we propose a hybrid approach based on a graph representation coupling high-level knowledge with low level knowledge in graph matching framework based on local search algorithm. We have been followed ontology paradigm for representation high level knowledge about sulci and their spatial relations, whereas low-level knowledge consists of graph model. Section 2 provides an overview of current methods for recognition and labeling of cortical sulci.section 3 presents the overall proposed approach and its components.section 4 conclusion and future work. 162 IT4OD 2014

169 major sulci, but modeling the variability of structures are limited, convergence not guaranty. B. Statistical model based approaches Figure 1 Recognition and labeling of cortical sulci[14]. Figure 2 Inter-subject variability on tow subject a, b and variation between right and left hemisphere in the same subject variation in shape. ii. OVERVIEWOF RECOGNITION AND LABELING OF CORTICAL SULCI APPROACHES A. Deformable atlas registration based approaches Its principle consist of warping a brain model (where all anatomical structures have been labeled by an expert manually ) to a studied brain, thereby transferring labels from the model to the appropriate locations on the test brain. In the following we present the most important approaches based on this paradigm. In [7],the authors elastically deform a reference labeled brain volume to fit the patient brain scan. The brain atlas is parameterized with B- spline surfaces and it is deformed under an energy minimization scheme. The energy attracts the atlas fold points to fold points on the patient image and the remaining points to the patient brain surface. Finally atlas labels are transferred to the patient brain surface. In [8],the authorshave proposed a method to progressively match an atlas labeled mesh to the patient brain surface mesh from the largest folds to the smallest folds. Then they transfer the labels from the matched mesh to label the patient mesh. These approaches aresimple and they have demonstrated a successful identification of some With the development of images processing methods and to overcome the problems with techniques based on deformable atlas registration several alternative approaches were proposed in which sulci are matched with models from a training database based on characteristics such as shape, location or structure. In the following we present some approaches based on this paradigm. In [9],the authorshave basedon a nearestneighbor algorithm to define the classes of the sulcal regionsusing anatomical landmarks such as gyri as a features. In [10],the authorspropose markovian frameworkto recognize sulci in which they have used the probabilistic Statistical Parametric Anatomy Map (SPAM) model as the prior information about sulci locations that returns the probability of presence of each sulcus at a given 3D location and the SVR algorithm to learn shapes and local sulci scheme, In [11],the authorspropose a method to automatically label sulcal basinsthat employs a model contains both a volumetric description of the mean shape and possible variations in the location of each sulcal basin obtained from set of training images.. In [12], a technique has beenproposed for automatically assigning a neuroanatomical label to both sulci and gyri. In this procedure they have used probabilistic information about their spatial distribution estimated from a manually labeled training set. Statistical model based approaches are frequently used in this context because it is simple to implement and they rely on strong mathematical theories that allow to well represent the interindividual variability. Despite they provide well results, it fails on some points: - A part of information about spatial relations between the cortical structures is implicit in the images and it is not used. - Training is based on a limited number of subjects. C. Graph based approach To overcome the limitations of statistical model based approaches other approachesbased graph are developed to represent spatial relations and the neighborhood information. In these approaches sulci are represented by nodes in the graph and arc connecting them represent their relationship to one another. This representation is supposed to include all the information required to identify the cortical structures. In fact, working at a higher level of representation leads to more efficiency for the 163 IT4OD 2014

170 pattern recognition process and it is very powerful for anatomical structures presenting high interindividual variability. In[13], a method rely on statistical model of cortical sulci including an average curve represents the average shape, position of each sulcus and a search area accounts for its spatial variation domain. The model is represented by a graph. The recognition procedure is based on a sub-graph isomorphism search between a model graph and sub-graphs made of primitives extracted from the input image. The result is well for the primary sulci (ex the central sulcus). In [14],the authorshave introduced a sulci recognition system relies on an abstract structural representations of the cortical folding patterns and their relationship as a graph. This method can be interpreted as a graph matching approach, which is driven by the minimization of a global function made up of local potentials. Each potential is a measure of the likelihood of the labeling of a restricted area. The potential is given by a multilayer perceptron trained on a learning database. In [15],the authorshave presented an approach for brain mapping to map labels of sulci from a brain atlas to individual brains. The topological and geometric information of sulci is stored in a topology graph in which each node contains position and size of the associated sulcus and edge contains connectivity information between sulci. The mapping is performed by matching between topology graphs for both atlas brain and user brain graph nodes. In[16], an approach has been proposedfor joint sulci detection on cortical surfaces by using graphical models and boosting techniques incorporate shape priors of major sulci and their markovian relations. For each sulcus, they represent it as a node in the graphical model and detect the sulci jointly via the solution of an inference problem on graphical models. Another graph-based approach was taken in [17]for labeling major sulci that uses prior statistical information about location, orientation, shape using geometric moment invariants and neighborhood structure. Labeling is recast as a graph matching problem between a model graph of sulcal features and their neighborhood relationships and a candidate graph composed of sulcal segments obtained from the test subject. The combinatorial optimization problem is solved by using a genetic algorithm. The advantage of this approach is its rich structural context representation of sulcal segments including location, orientation and shape because the relative position of sulci is highly invariant and using neighborhood relationships is very advantageous for identification. The shortcoming of this approach is not robust in pathological cases. Approaches based on numerical priors provided by atlases, statistic models or graph, are not robust and highly sensitive to deformation caused by pathological processes such as tumors, because they are built only in healthy brains, imprecise in the treatment of new cases and consider the stable gyri and sulci only. D. Ontology based approach In [18], a hybrid system for labeling anatomical structures in brain MRIhas been proposed. The system involves both numerical knowledge from an atlas and symbolic knowledge represented in a rule extended ontology written in standard web languages and symbolic constraints. The system combines this knowledge with graphical data automatically extracted from the images. Labels of the parts of sulci and of gyri located in a region of interest selected by the user are obtained with a reasoning based on a Constraint Satisfaction Problem solving combined with Description Logics inference services. This approach is robust in presence of pathologyand results labels with high precision,but it is not fully automatique. Our approach combines key concepts fromontology based approach and graph based approach to our knowledge, there exists any approach that employing ontologies in graph matching algorithm to recognition and labeling of cortical sulci in MRI images. iii. METHOD 1.Overview To address problem of labeling cortical sulci in MRI images, we begin by segmentingsulcal segmentsfrom cortical surface andsubsequently low-level descriptors and spatial relations between them are computed to construct graph representation. Collaborating with domain expert, we construct an ontology represent high level knowledge about cortical sulci to model their qualitative description and relation between them. A model graph representation wasconstructed from a training database of MRI brains images where sulci have been manually recognized and labeled by neuroanatomist. Then, we try to recognize sulci of new subject by selecting suitable candidates from the set of sulcal segments to form sulci and build a similar graph Gc from them to maximize the similarity between Gm and Gc. The selection of sulcal segments is based on local search and driven by ontology of sulci. Based on neighborhood information from ontology and low level features,we recognize sulci sequentially. Ex if we have recognized central sulcus and we want to recognize precentral sulcus which is 164 IT4OD 2014

171 anterior to it according to the ontology we have to search a candidate sulcal segments set ψ i which are in anterior of central sulcus and maximize the similarity with it. And the same procedure to recognize all sulci in neighbor of central sulcus. 2.High level knowledge modeling Collaborating with domain expert and take advantage of existing work we have represented highlevelknowledge about cortical sulci in the form of ontology in which we model their qualitative description and relation between them. The main sources that have helped us in this modeling are the FMA ontology (Foundational Model ofanatomy) [20], the ontology of Dameron [21] without forget experts inneuroanatomy. Note that our ontology is not generic but it is specific to our approach and the concepts and relationships defined in it concern only the description of sulci. Here sulci are defined by their relationships and their neighborhood. For example one sulcus is defined by sulci which is adjacent to, the lobe to which it belongs, the sulcus with which it is contiguous, etc.. The used concepts are (sulcal, fold Sulcus, SulcusSegment, lobe). four groups of relation are defined : subsumption, mereological, topological and continuity relations. The subsumption relations are used to express that a specificanatomical entity is type of another, mereological relations are used to express that a specificanatomical entity is anatomical part of another, topological relations are used to express that two anatomical entities are adjacent and continuity relations are used to express that a specificanatomical entity is connected to another. 3.Sulci extent region map One given sulcus is located in a relatively constant region on the cortical surface. We define this region based on the location of instances of the same sulcus type labeled by the expert in surface meshes from the training databasethe map is obtained by registration sulci in a common coordinate system. 4. Sulci segmentation Several processing steps are required before segmentation of sulcal segments: registration of 3DMRI brain images to the stereotaxic coordinate system, tissue segmentation separation of cerebral hemispheres, gray/white matter interface reconstruction and segmentation of sulcal segments all these processing steps are performed using methods described in [19]. 5. Low-level descriptors computation 5.1Geometric features - Geometric moment invariants They are used as descriptors of the sulcus shape. From the regular moments of order0 to2,13 rotation-invariant moments are derived [17]. Suppose the origin of the coordinate system was shifted to the centroid of segment S composed of vertices vs. A regular moment of order p, q, r for function ρ is defined as We put ρ(x,y,z)=1, because we are interested only in shape. - Depth computation: we compute the distance between the bottom of the sulcus and the surface of the brain. - Direction: (normalized vector from starting point to ending point). - Position:we compute the gravity center to describe the position. 5.2 Spatial relations computation - Relative position: we compute distance between adjacent sulcus by distance between their centers of gravity. - Relative orientation: defined as the scalar product between the two normalized main vectors of the two sulci. - Connectivity: we compute if a couple of sulci are connected. 6. Sulcal segment region extent Computing of region extent for each sulcal segment is performed as follow: sulcalsegments are realigned into the sulci extent region map and for each sulcal segment wecompute how much sulcal segmentsssi is overlapping with region extent of sulcus Si. The overlap degree Od for segment ssi with sulcus j is given by: Od(ssi/Si)= 1 ns ns Vs =1 Vss Si Where ns corresponds to the number of vertices of segment ssi. 7.Graph representation We have a set of sulcal segments, so we construct a graph that stores this set and their features in which nodes correspond to a sulcal segments and edges indicate the neighborhood relationships between them. 165 IT4OD 2014

172 8. Sulcus model graph Sulci have been manually identified and labeled on a training database of MRI brains images by neuroanatomist. All images of the database are registered in the same coordinate system (the space of Talairach). For each sulci,low-level descriptors and neighborhood information are computed and we form feature vector f. Nodes of the graph model Gm represent the Ns sulci and the node attributes for each sulcus are represented byaverage, standard deviations and minimal and maximum values of the components of feature vectors of instances of the same sulcus typethe edges of Gm indicates the neighboring relationships between sulci. 9.Labeling sulci Because the over segmentation of sulci nodes in graph subject are sulcal segments,but nodes in model graph Gm are labled sulci, so we needtoassemblesulcus from sulcal segment set candidates and construct from them a graph Gc that is structurally equivalent to Gm and canbecomparedusingasimilarityfunction. So, theprocessis recastasan optimizationproblem:findthesulcal segment set candidates to be Gc best match Gm. The process is describedinthe following algorithm. 10.Objective function Let S 1,S 2,S 3 S n denote the n sulci we want to recognize represented by the model graph Gm and SC1, SC2, SCn are the n sets of sulcal segmentscandidates forsulci S 1,S 2,S 3 S n, respectively. A segment node V can only belong to a single sulcus: SCi SCj = φ, (i, j = 1.. n; i j). The objective function FOcan be defined as: FO(SC1, SC2., SCn) = SIMf(SC1, SC2., SCn) +SIMe(SC1, SC2., SCn) SIMf(SC1, SC2., SCn)= SIMe(SC1, SC2., SCn) = nn ii=0 nn ii=0 SIM(fSCi, fsmi) SIM(feSCi, fesmi) Where SIMf and SIMeare unary feature- and neighborhood-based similarity values. 11.Local search ontology driven algorithm a- For each test sulcal segment, we computeoverlap degree with a specific sulcus usingsulci extent region map. b- Construct sulcal segment set candidate SC i for sulcus S i by collecting connected sulcal segments ssi and which they have the greater overlap degree with the same sulcus. c- Compute similarity bettwen each sulcal segment set candidate SCi and sulcus modelsm i from graph model SIM(fSC i,fsm i ). d- Take segment set candidate SCithat has the best similarity with sulcus model as reference for starting search. e- Give neighborhood relationfor reference sulcus Si with sulcus S j i j from ontology. f- Using neighborhood relationand sulcal segment set candidate SCiselected in step d we search the best sulcal segment set candidate SC j for sulcus S j in orientation O with sulcus S i for all sulcal segment node in SC j do if we add a neighbor node to SCj increase SIM(fSC j,fsm j ).then add this neighbor nodeto SCj end if end for repeat steps from d to f until no new sulcal segment is added. iv. CONCLUSION In this work we have proposed an approach to automatical cortical sulcilabelingbased on a graph representation and couplinghigh-level knowledge with low level knowledge in graph matching framwork based on local search algorithm. High-level knowledge consists of ontology that describe sulci and their spatial relations, whereas low-level knowledge consists of a priori statistics about geometric features represented in graph model. The large variation of cortical folding patternsmakes automatic labeling a challenging problem, which cannot be solved by numeric knowledge (low level descriptors) alone the reason that we integrate several forms of a priori knowledge concerning the sulci. In future work we includes validation of our method with many subjects and compare it with other approaches. We also intend to exploit spatial relations involving more than two sulci and improve the graph matching algorithm. v. REFERENCES [1] D. L. PHAM, C. XU, J. L. PRINCE.Current Methods in Medical Image Segmentation. Annual Review of Biomedical Engineering, 2, [2 ] N. Tirpude, R. R. Welekar. A Study of Brain Magnetic Resonance Image Segmentation Techniques. IJARCCE Vol. 2, Issue 1, January IT4OD 2014

173 [3] K. Aloui et al. Characterization of a human brain cortical surface mesh using discrete curvature classification and digital elevation model. J. Biomedical Science and Engineering [4] A. Klein, J. Tourville. 101 labeled brain images and a consistent human cortical labeling protocol. Frontiers in Neuroscience Brain Imaging Methods,vol.6, Art [5]N. Tzourio-Mazoyer, P. Y. Herve, B. Mazoyer. Neuroanatomy : Tool for functional localization, key to brain organization. NeuroImage, vol. 37, no. 4, pages , [6]Thompson. P. M, et al. Mapping cortical change in alzheimers disease, brain development, and schizophrenia. NeuroImage 23, S2 S [7] S. Sandor and R. Leahy. Surface-based labeling of cortical anatomy using a deformable atlas. IEEE Trans. Med.Imag. 16(1), pp , [8]S. Jaume, B. Macq, and S. K. Warfield. Labeling the brain surface using a deformable multiresolution mesh,. in Proc. MICCAI [9] K. J. Behnkeb et al. Automatic classification of sulcal regions of the human brain cortex using pattern recognition. Medical Imaging: Image Processing, Proceedings of SPIE Vol [10]Perrot. M, Rivière. D, Mangin. J.F. Identifying cortical sulci from localization, shape and local organization. In ISBI, pp [11] G. Lohmann, D.Y. von Cramon. Automatic labeling of the human cortical surface using sulcal basins. Medical Image Analysis [12] B. Fischl et al. Automatically parcellating the human cerebral cortex. Cerebral Cortex, vol. 14, no , [13]N. Royackkers, M. Desvignes, M. Revenu. Une méthode générale de reconnaissance de courbes 3D : application à l'identification de sillons corticaux en imagerie par résonance magnétique. Traitement du Signal vol 15 - n [14]D. Riviere, J. F. Mangin, D. P.Orfanos, J. M. Martinez, V. Frouin, J. Regis. Automatic recognition of cortical sulci of the human brain using a congregation of neural networks. Med. Image Analysis 6, pp , [15]F. Vivodtzev, L. Linsen, B. Hamann, K.I. Joy, B.A. Olshausen. Brain mapping using topology graphs obtained by surface segmentation. In Scientific Visualization: The Visual Extraction of Knowledge from Data, Springer, Berlin, Heidelberg,, pp [16]Y. Shi, Z. Tu, A. Reiss, R.A. Dutton, A.D. Lee, A. Galaburda, I. Dinov, P.M. Thompson, A.W. Toga. Joint sulci detection using graphical models and boosted priors. In Information Processing in Medical Imaging, Lecture Notes in Computer Science, vol. 4584, Springer, Berlin,, pp [17]F. Yang, F. Kruggel. A graph matching approach for labeling brain sulci using location, orientation, and shape.neurocomputing [18] A. Mechouche, C. Golbreich, X. Morandi, B.Gibaud. Ontology-Based Annotation of Brain MRI Images. In American Medical Informatics Association, Annual Symposium, p , Washington, USA, [19] Mangin, J..F, Frouin, V, Bloch, I, Regis, J, Lopez-Krahe, J. From 3D magnetic resonance images to structural representations of the cortex topography using topology preserving deformations.journal of Mathematical Imaging and Vision 5 (4), [20] Cornelius Rosse, Jr. José L. V. Mejino. A reference ontology forbiomedical informatics : the foundational model of anatomy. J. ofbiomedicalinformatics, vol. 36, no. 6, pages , [21] O. Dameron, B. Gibaud, X. Morandi. Numeric and SymbolicRepresentation of the Cerebral cortex Anatomy : Methods and preliminaryresults. Surgical and Radiologic Anatomy, vol.26,no.3,p , IT4OD 2014

174 Processus de construction d'ontologie basé sur l'extraction automatique des connaissances à partir des textes structurés Driouche Razika Kemcha Nawel Bensassi Houda Département TLSI, Faculté des NTIC, Département TLSI, Faculté des NTIC, Département TLSI, Faculté des NTIC, Université Constantine 2, Algérie. Université Constantine 2, Algérie. Université Constantine 2, Algérie. Résumé La construction manuelle d'une ontologie s'avère être un travail fastidieux et couteux, car il nécessite l'identification des concepts et relations potentiels, puis le placement de ces éléments. L extraction des connaissances à partir des textes répond à la problème de gérer et de traiter une grande masse de textes qui dépasse les capacités humaines. Afin de concevoir des ontologies sémantiquement plus riches, ce travail propose un processus appelé PECCO pour l extraction des connaissances à partir des textes structurés dont le but est de construire une ontologie de domaine, où les concepts et les relations sont extraits à partir d un ensemble des corpus de textes, représentant des rapports médicaux des malades. Nous utilisons certains outils d extraction terminologique tel que : R.TeMiS, et TreeTagger pour l étiquetage morphosyntaxique et Protégé OWL pour l implémentation de l ontologie. Mots Clé Extraction des connaissances, Corpus, Ontologie, Méthontology, Processus. I. INTRODUCTION Les masses de données textuelles aujourd'hui disponibles engendrent un problème difficile lié à leur gestion ainsi qu'à leur traitement. Des méthodes de fouille de textes et de traitement peuvent en partie répondre à une telle problématique. Elles consistent à modéliser puis mettre en œuvre des méthodologies appliquées aux données textuelles afin d'en déterminer le sens et découvrir des nouvelles connaissances. La plupart des méthodologies proposées s'appuient sur des approches linguistiques ou statistiques [17], [18] Vue l'énorme volume des connaissances que les utilisateurs du domaine médical sont tenus de posséder. Les grands efforts doivent être faire pour retenir toutes ces connaissances et pouvoir les utiliser constamment. L'évolution rapide des informations de cette discipline, le besoin du partage et la réutilisation ont mené au développement des ontologies médicales [8]. La construction manuelle d'une ontologie s'avère être un travail fastidieux et couteux, car il nécessite l'identification des concepts et relations potentiels, puis le placement de ces éléments. L'extraction des connaissances et leur utilisation pour la construction des ontologies doit donc être automatisée. En effet, la masse d'information évoluant rapidement dans les divers domaines modélisés, il devient nécessaire de faire valider l'ontologie afin qu'elle reflète le mieux possible la réalité du moment. C'est dans ce cadre que s'inscrit notre réflexion. Ce papier propose un processus d'extraction des connaissances à partir d'une base de connaissances médicale étendue avec les connaissances du service de gynécologie, afin de construire une ontologie du domaine. Cela offre une bonne structuration, organisation, et modélisation des connaissances. Cela nous a induits à la construction d'un système qui représente toutes les connaissances relatives au domaine. Nous nous basons sur les standards et les outils actuels du web sémantique. Le modèle conçu est basé sur le standard OWL, et sur les outils : Protégé OWL et Racer. Ce papier est organisé comme suit. La section 2 présente l ingénierie des connaissances alors que la section 3 montre des aspects liés à l extraction des connaissances. La section 4 décrit quelques méthodes de construction des ontologies. Ensuite, la section 5 détaille notre processus proposé, appelé.pecco. Ce processus s articule sur deux grandes parties : l extraction des connaissances et la construction des ontologies. Enfin, la section 6 conclut notre travail et donne quelques perspectives pour des travaux futurs. II. INGENIERIE DES CONNAISSANCES L'ingénierie de connaissances (IC) ou «Knowledge engineering» tire ses origines de l intelligence artificielle (IA). Elle a été évoluée depuis les années 70, l IC s intéresse, de plus en plus, aux problématiques de l acquisition et de la modélisation des connaissances, elle offre la méthode scientifique pour analyser et manipuler les connaissances, et propose des concepts, méthodes et techniques permettant de modéliser, de formaliser, d acquérir des connaissances dans l organisation dans un but d opérationnalisation, de structuration ou de gestion au sens large [1]. Un grand nombre de méthodes d IC était dédié à l origine à la création de systèmes à base de connaissances. Sur une longue liste de ces méthodologies, nous citons : La méthode 168 IT4OD 2014

175 SAGACE [2], La méthode CommonKADS [3], La méthode REX [4], La méthode MASK [5]. III. EXTRACTION DES CONNAISSANCES L extraction des connaissances (EC) est l ensemble des processus de découverte et d interprétation de régularités dans les données. On distingue trois (3) types d EC : A. Extraction des connaissances à partir des données (ECD) fouille de données ou Data mining C est une activité qui consiste à analyser un ensemble de données brutes pour en extraire des connaissances exploitables. Les connaissances sont des éléments qui possèdent une syntaxe et une sémantique formalisées dans un langage de représentation de connaissances [6] B. Extraction des connaissances à partir de web (ECW) ou Web mining C est l ensemble des techniques qui visent à explorer, traiter et analyser les grandes masses d informations consécutives à une activité Internet, il est issu des applications des techniques de data mining. L ECW peut s intéresser aux données présentés sur les sites web, on parle alors de texte mining, ou aux données coté serveur dont l exploitation est plus proche des techniques de data mining [7]. C. Extraction de connaissances à partir de texte (ECT) fouille de texte ou Text mining C est introduite au milieu des années 90 sous le terme Knowledge Discovery in Textual Databases (KDT). L ECT répond à la problématique de gérer et de traiter une grande masse de textes qui dépasse les capacités humaines, ce processus commence par la modélisation des textes et se termine par l interprétation des résultats de l extraction et l enrichissement des connaissances [8] IV. CONSTRUCTION DES ONTOLOGIES Nées des besoins de représentation des connaissances, les ontologies sont à l heure actuelle au cœur des travaux menés en IC. La construction des ontologies demande à la fois une étude des connaissances humaines et la définition de langages de représentation, ainsi que la réalisation de systèmes pour les manipuler. L IC a ainsi donné naissance à l'ingénierie ontologique, où l'ontologie est l'objet clé sur lequel il faut se pencher. Plusieurs travaux proposent des méthodes de construction d'ontologies. Nous relevons un certain nombre de travaux visant à proposer une méthode de création d'ontologies tel que: ENTERPRISE[9], TOVE [10] et METHONTOLOGY[11]. A. ENTERPRISE Enterprise basé sur l expérience de construction d ontologies dans le domaine de la gestion des enterprises [9]. B. TOVE Tove aboutit à la construction d un modèle logique de connaissances [10]. C. METHONTOLOGY Methontologie vise la construction d ontologie au niveau de connaissance et distingue les étapes suivantes : Cadrage (spécification), conceptualisation, formalisation, implémentation [11]. Bien que manuelles, les approches présentées imposent un cadre pour la définition d'ontologies. Elles sont d'ailleurs généralement reprises en partie dans les méthodologies de construction semi-automatiques d'ontologies. La méthode mixte ou semi-automatique combine les deux approches (manuelle et automatique) en proposant à des utilisateurs, généralement des experts du domaine, de vérifier la structure sémantique définie automatiquement dans une première étape. Parmi les nombreux travaux existants dans la création semi-automatique d'ontologies, nous centrons notre analyse sur les approches reposant sur l'analyse de textes [12]. Les méthodes de construction d'ontologies à partir de textes privilégient souvent l'analyse du texte. Nous présentons la méthode TERMINAE [13]. Cette méthode vise à construire des ontologies dans un sous domaine de l EC impliquant des traitements linguistiques et distingue les étapes suivantes : analyse des besoins, constitution du corpus, analyse linguistique, normalisation, formalisation sémantique [13]. V. PROCESSUS PECCO Nous proposons un processus d extraction de connaissance et de construction d ontologie appelé PECCO (Processus d Extraction de Connaissance et Construction d Ontologie). Notre processus est composé de quatre (4) phases principales, la spécification, l étude linguistique et l extraction des connaissances, la construction de l ontologie et enfin, l enrichissement de l ontologie développée (fig 1). Le processus PECCO commence par la phase de spécification de l ontologie qui consiste à décrire l ontologie et établir l étude des besoins de cette dernière. La deuxième phase est l étude linguistique et l extraction des connaissances, elle introduit les étapes suivantes : 1) Prétraitement et analyse, 2) Extraction de termes, 3) Nettoyage et filtrage, 4) Classification. Ensuite, la troisième phase consiste à la construction de l ontologie et introduit les étapes suivantes: 1) La Conceptualisation, 2) La formalisation et l implémentation. La dernière phase concerne l enrichissement de l ontologie afin de suivre l évolution du domaine concerné. 169 IT4OD 2014

176 Fig. 1. Processus Pecco. A. Spécification de l'ontologie Cette phase a pour but de fournir une description claire du problème étudié et d'établir un document de spécification des besoins. Nous dériverons l'ontologie à construire à travers les cinq (5) aspects suivants : 1) Le domaine de connaissance L e domaine d'étude concerne la spécialité de gynécologie en médecine. 2) L'objectif opérationnel L 'objectif sert à structurer les termes du domaine et aider l'agent de santé dans le cadre de son travail. 3) Les utilisateurs : Les médecins et les agents de santé du service gynécologie sont censés utilisés notre ontologie. 4) Les sources d'informations : Il s'agit de sélectionner parmi les différentes sources de connaissance celles qui permettront de répondre aux objectifs de l'étude. Dans le cadre de notre travail, Les textes utilisés sont sélectionnés à partir d'un ensemble des rapports médicaux que nous avons collecté afin de former des corpus de texte en langage naturel. Nous nous sommes basés aussi sur plusieurs travaux voisins dans le domaine des ontologies [6][8][12]. 5) La portée de l'ontologie Parmi les termes, nous citons : patiente, consultation, traitement, grossesse, accouchement, échographie, nouveauné etc. B. Etude linguistique et extraction Cette phase consiste à traiter le corpus brut afin d'avoir un corpus traité. Ce dernier est utilisé afin d'extraire les termes. Par la suite, une opération de nettoyage est nécessaire afin d'éviter les doubles et les mots vides et aboutir à un corpus nettoyé. Une deuxième opération de filtrage va nous servir à construire notre dictionnaire de terme. Ensuite, l'étape de classification sert à répartir les termes en deux (2) listes, l'une pour les relations et l'autre pour les concepts candidats. Nous allons présenter en détail les quatre (4) étapes de l'étude linguistique et l'extraction. 1) Prétraitement et analyse Le prétraitement vise à définir une stratégie pour traiter les données manquantes. Il consiste à normaliser le texte afin d'obtenir des résultats cohérents et aussi, dans la mesure du possible, à corriger certaines erreurs et à expliciter des informations manquantes à l'aide parfois de ressources externes. Cette étape sert à normaliser les diverses manières d'écrire un même mot, à corriger les fautes d'orthographe évidentes ou les incohérences typographiques et à expliciter certaines informations lexicales exprimées implicitement dans les textes. L'analyse textuelle ou linguistique du corpus revient à systématiser et à rendre plus efficace la recherche des données conceptuelles dans les textes. L'analyse se fait à l'aide d'un expert linguistique. Nous avons aussi utilisé le correcteur d'orthographe en ligne Reverso [16], pour avoir un corpus sans erreurs. 2) Extraction des termes L'extraction des éléments sémantiques consiste à retirer des concepts importants et les relations entre ces concepts que nous allons les utiliser tout au long du processus de développement de l'ontologie du domaine. Un terme est un syntagme, c'est-à-dire qu'il est constitué d'un ou plusieurs mots pris ensemble dans une construction syntaxique considérée comme une unité insécable. Ce terme ne prend de sens que par rapport au contexte dans lequel il est utilisé (corps de métier, domaine technique, domaine scientifique, etc.). Il s'agit de domaine de spécialité. L'extraction de termes alors vise à recenser tous les termes contenus dans un corpus. Le résultat d'une extraction est une liste de termes. Cette étape est automatiquement réalisée à l'aide de deux outils : R.TeMiS, version 10.6 [14] et TreeTagger, version 3.2[15]. R.TeMiS 10.6 : est un outil sert à créer, manipuler, analyser et extraire des termes à partir des corpus de textes. Les textes entrés doivent être organisés sous formes des phrases, et 170 IT4OD 2014

177 sauvegardés dans un fichier de type (txt). L'avantage de cet outil est de calculer le nombre d'occurrence de chaque terme, cela va nous aider à choisir les termes appropriés pour la construction de notre ontologie médicale, en prenant en compte le terme qui apparait plusieurs fois par rapport à un autre. R.TeMiS a l'avantage aussi de tirer les mots vides, et cela va nous faciliter leur suppression [14]. TreeTagger 3.2 : est un outil d'étiquetage morphosyntaxique et de lemmatisation. Cet outil sert à assigner à chaque terme d'un texte sa catégorie morpho-syntaxique (nom, verbe, adjectif, article, nom propre, pronom, abréviation, etc.) et donner pour chaque terme sa lemmatisation. Les textes entrés doivent être organisés sous formes des phrases, et sauvegardés dans un fichier de type (txt). TreeTagger va nous aider à classifier les termes selon des concepts et des relations et éliminer les mots vides [15]. 3) Nettoyage et filtrage L'étape de nettoyage sert à améliorer la qualité du résultat obtenus par : l'élimination des mots vides et l'élimination des doublons. L'objectif est de corriger ou de contourner les erreurs qui sont glissés dans le contenu du corpus. Le nettoyage sert à fournir un corpus nettoyé pour bien extraire les termes utiles à la construction de l'ontologie. Dans le cadre des méthodes statistiques, plusieurs mesures sont couramment utilisées afin de sélectionner les termes candidats. Parmi elles, on peut citer le nombre d'apparitions d'un terme au sein d'un corpus, ainsi que des mesures plus complexes tels que l'information mutuelle, tf-idf, T-test ou encore l'utilisation de lois de distributions statistiques des termes [17]. Cependant, toutes ces techniques ne permettent de détecter que de nouveaux concepts et aucune d'elles ne permet de les placer précisément au sein de l'ontologie, ni de déterminer si des relations existent entre eux. Les méthodes basées sur l'analyse syntaxique, quant à elles, utilisent les fonctions grammaticales d'un mot ou d'un groupe de mots au sein d'une phrase. Elles posent l'hypothèse que les dépendances grammaticales reflètent des dépendances sémantiques [18]. D'autres approches utilisent des patrons syntaxiques [19]. Les termes extraits mettent alors en évidence de nouveaux concepts, mais également l'existence de relations entre eux. Dans notre travail, nous allons utiliser le nombre d'occurrence des termes qui va nous aider à réaliser le filtrage. Le filtrage sert à sélectionner les termes appropriés (termes candidats) que nous venons d'extraire, ces termes devront être appropriés pour le domaine de notre étude qui est la gynécologie. 4) Classification Dans cette étape, nous allons classifier les éléments sémantiques (les termes) extraits selon deux catégories : les concepts et les relations. Pour cela, nous allons utiliser le résultat d'étiquetage effectué par l'outil TreeTagger afin de classer chaque terme selon sa catégorie. Ainsi, nous allons tenir que les termes et les relations nécessaires à la construction de l'ontologie de domaine. De ce fait, nous allons classifier les NOM (noms propres), ADJ (les adjectifs), ABR (les abréviations) comme des concepts et les termes de type VER (verbe) comme des relations. Ainsi les termes de type KON (conjonction), PRO (Pronom), DET : ART (article), DET : POS (pronom possessif), PRO : REL (pronom relatif), etc. Comme des mots vides qu'il faut les éliminer et les supprimer. C. Construction de l'ontologie Notre ontologie médicale à construire concerne le service gynécologie. Pour ce faire, nous allons suivre les étapes du processus de construction d'ontologie qui s'appuie sur la méthode Methontology [11] et la méthode TERMINAE [13]. La méthode TERMINAE construit une ontologie à partir des termes extraits des ressources. Methontology a pour objectif de donner une définition à chaque concept et à chaque relation de l'ontologie dans un langage de LDs. Elle nous spécifie les étapes à suivre pour construire une ontologie: 1) La conceptualisation, 2) La formalisation, 3) L'implémentation. 1) Conceptualisation Elle consiste à identifier et à structurer les connaissances du domaine à partir des sources d'informations. L'acquisition de ces connaissances peut s'appuyer à la fois sur l'analyse de documents et l'interview des experts du domaine. Une fois que les concepts sont identifiés par leurs termes. La sémantique de ces concepts est décrite par un langage semi- formel à travers leurs propriétés et leurs instances. Pour ce faire, nous distinguons les principales activités suivantes : a) Construction d'un glossaire de termes. b) Construction des hiérarchies des concepts. c) Construction du diagramme de relations binaires. d) Construction du dictionnaire des concepts. e) Construction de la table des relations binaires. f) Construction de la table des attributs. g) Construction de la table des axiomes logiques. h) Construction de la table des instances. i) Construction de la table des assertions. L étape de conceptualisation est alors constituée d'un ensemble des activités permettant d'aboutir à un ensemble de représentations intermédiaires semi-formelles que l'on appelle une ontologie conceptuelle. Le glossaire de termes recueille et décrit tous les termes candidats extraits et filtrés dans l étape d'extraction des termes, et qui sont utiles et utilisables pour la construction de l'ontologie de domaine. Une hiérarchie de concepts organise un groupe de concepts entre eux sous forme d'une taxonomie. Elle démontre l'organisation des concepts définis dans le glossaire de termes en un ordre hiérarchique qui exprime les relations (sous classes, super classe). Une relation binaire permet de relier deux concepts entre eux (un concept source et un concept cible). Cette activité consiste donc à construire un diagramme de relations binaires qui permet de représenter de manière graphique les différentes relations qui existent entre les divers concepts de même ou de différente hiérarchie. 171 IT4OD 2014

178 Fig. 2. Diagramme des relations binaires Le dictionnaire de concepts contient les concepts du domaine. Pour chaque concept nous définissons ses instances connus, ses attributs, et ses concepts synonymes. La construction de la table des relations binaires sert à définir pour chaque relation utilisée dans le diagramme de relations binaires : le nom de relation, le nom de concept source et cible, le nom de relation inverse et les cardinalités sources et cibles. La construction de la table des attributs spécifie pour chaque attribut inclut dans le dictionnaire de concepts, l'ensemble des contraintes et des restrictions sur ces valeurs. La construction de la table des axiomes sert à définir les concepts au moyen des expressions logiques. Chaque axiome comporte : le nom de concept sur lequel porte l'axiome, une définition en langage naturelle, et une expression logique (fig3). Concept Médecin Fig. 3. TABLE DES AXIOMES LOGIQUES Description Un médecin travaille dans une structure de santé, examine des patientes, prescrit des traitements et dépiste des maladies. Expression logique (X).Médecin(X) Ǝ(Y), Structure-santé(Y) Travaille(X, Y) Ǝ(Z). Patiente(Z) Examine(X, Z).Ǝ(W). Traitement(W) Prescrit(X, W) La construction de la table des instances décrit les instances connues, qui sont déjà identifiées dans le dictionnaire des concepts. Pour chaque instance, il faut spécifier le nom d'instance, le nom du concept ou elle appartienne, ces attributs et les valeurs qui lui y sont associés. Les assertions affirment l'existence des relations entre les instances. La construction de la table des assertions définit pour chaque concept source représenté dans la table des relations binaires, et pour chaque instance du concept source définie dans la table des instances. Les instances du concept cible qui sont en relation par avec les instances du concept source. 2) Formalisation Cette étape consiste à formaliser le modèle conceptuel obtenu dans l'étape précédente par un formalisme de représentation d'ontologie telle que la logique de description [20]. L'étape de formalisation est constituée d'un ensemble de phases, qui à partir du modèle conceptuel définit dans l'étape précédente, permet d'aboutir à une ontologie formelle. Le résultat est une base de connaissance en logique de description composée de deux parties : Tbox et Abox. 3) Implémentation Différents outils ont été proposés pour aider à la conception manuelle d'ontologies. Ces outils permettent d'éditer une ontologie, d'ajouter des concepts et des relations, etc. Ils intègrent différents langages de formalisation (RDF, OWL). Protégé OWL est utilisé pour l implémentation de l ontologie médicale. C est une interface modulaire, permettant l'édition, la visualisation, le contrôle (vérification des contraintes) d'ontologies [21]. En plus, cet outil permet de formuler l'ontologie dans le langage de représentation de connaissance OWL, et de vérifier l'ontologie par le raisonneur RACER 4) Test de validation L'étape de validation intervient pour tester et valider l'ontologie, ainsi que son environnement logiciel et sa 172 IT4OD 2014

179 documentation. Les problèmes de cohérence, de satisfiabilité, de subsomption sont alors vérifiés grâce à la connexion au moteur d'inférence RACER. Une fois l'ontologie construite et validé par les tests de RACER. Elle est prête à être interrogée par les requêtes d'utilisateurs. Le système passe par plusieurs étapes pour générer une requête en langage formel nrql. [21] Nous allons prendre un exemple d'interrogation d'ontologie. La requête de recherche en langage naturel : «Qui est le médecin qui examine la patiente Majri Sonia?» Traduction en nrql en appliquant les règles de génération : <Médecin, Examine, patiente_majri_sonia> (?x Médecin patiente_majri_sonia Examine ). La requête sera alors envoyée au raisonneur RACER qui va l'exécuter et rendre le résultat suivant : (((?x Médecin_Abdelli_Riadh ))). D. Enrichissement et maintenance L'enrichissement d'ontologie consiste à une phase pour la recherche de nouveaux concepts et relations et le placement de ces concepts et relations au sein de l'ontologie [18]. L'enrichissement d'ontologies nécessite l'extraction de termes représentatifs dans le domaine de notre ontologie, l'identification des relations lexicales entre les termes et le placement des nouveaux termes dans notre ontologie. La maintenance de l'ontologie du domaine assure une meilleure représentation de cette dernière à l'aide de connaissances à jour. En effet, le contenu sémantique des nouveaux textes (ainsi que des anciens) est mieux structuré grâce à l'intégration des termes clés du domaine dans l'ontologie. VI. CONCLUSION Ce papier aborde deux problématiques, l extraction des connaissances à partir des rapports médicaux. Ainsi que, la construction d une ontologie du même domaine médical. Pour cela, nous avons décrit et appliquer un processus pour le développement d ontologie définie par le langage OWL et basé sur l extraction des connaissances. Le processus proposé est complet, dans la mesure où, partant des données brutes représentées comme un réservoir de connaissances il permet d arriver à une ontologie opérationnelle. Pour ce faire, plusieurs phases sont suivies afin d expliciter et de guider le développement de l ontologie. Par ailleurs, plusieurs outils sont impliqués afin d aboutir progressivement à la sémantique des déclarations. Une fois la phase d extraction des connaissances terminée, on a passé à l étape de construction de l ontologie du domaine. Ensuite, nous avons testé et validé notre ontologie tout en mentionnant l utilité de l enrichissement par rapport à un domaine évolutif. Quelques perspectives peuvent être envisager pour un travail futur : i) l enrichissement de l ontologie avec d autres concepts pour être plus expressive et donc reflète réellement le domaine étudié, ii) le développement d'algorithmes basés sur la similarité sémantique pour l étape de filtrage contextuel de la phase d extraction des connaissances, iii) l intégration de l ontologie avec d autre ontologie du domaine médical par exemple le domaine d endocrinologie afin de répondre à des services à valeur ajoutée. REFERENCES [1] J.L.Ermine, «Enjeux, démarches et processus de la gestion des connaissances», Actes des journées francophones d'ingenerie des connaissances IC2000. Toulouse, France, [2] P.J. SAGACE, «Une représentation des connaissances pour la supervision de procédés», systémes experts de 2 ème génération, France, [3] J.Breuker, «Common KADS library for Expertise Modelling», IOS Press, Amsterdam, [4] P.Malvache Pierre, «Mastering Corporate Experience with the REX method», actes de la conference International Symposium, [5] J.L.Ermine, «Enjeux, démarches et processus de la gestion des connaissances», Actes des journées francophones d ingenerie des connaissances IC2000, Toulouse, France, [6] U.Fayyad and S.Piatetsky, «From data mining to knowledge discovery in data bases», Artifical intelligence magazine, [7] cture_du_web [8] A. Bodnari, L. Deléger, B. Grau, C. Grouin, T. Hamon, T. Lavergne, A. L. Ligozat, A. Névéol, X. Tannier, and P. Zweigenbaum. Re Making Sense of Medical Texts. Forum STIC Paris-Saclay [9] M.Uschold and M.King, Towards a methodology for building ontologies, Workshop on Basic Ontological Issues in Knowledge, [10] M.Gruninger and al, The role of competency questions in enterprise engineering, In IFIP WG5.7 Workshop on benchmarking, [11] M.Fernandez-Lopez and al, Methontology: from ontological art towards ontological engineering, In Proceedings of the AAAI97 Spring Symposium, [12] B. Bachimont, «Engagement Sémantique et Engagement Ontologique : Conception et Réalisation D'ontologies En Ingénierie Des Connaissances», In J. Charlet, M. Zacklad, G. Kassel & D. Bourigault (Eds.), Ingénierie des connaissances, évolutions récentes et nouveaux défis. Paris: Eyrolles [13] N. Aussenac-Gilles and B. Biébow, «Modélisation du domaine par une méthode fondée sur l analyse de corpus», actes de la conférence IC 2000, journées francophones d ingénierie des connaissances, [14] [15] [16] [17] P. Velardi and M. Missikoff, «Using text processing techniques to automatically enrich a domain ontology». In Proceedings of ACM- FOIS, [18] R. BENDAOUD, «Construction et enrichissement d'une ontologie à partir d'un corpus de textes», RJCRI'06, [19] M.A. HEARST, «Automatic acquisition of hyponyms from large text corpora», Proceedings of 14th International Conference on Computational Linguistics 2: 539., [20] F. Baader and al, The description logic handbook theory, Implementation and Applications, Cambridge University Press, [21] 173 IT4OD 2014

180 Le raisonnement sur la logique de description floue et décomposée GASMI Mohamed Departementd informatique Université de M sila-algerie- BOURAHLA Mustapha Departementd informatique Université de M Resumé Pour satisfaire le besoin de représenter et raisonner avec les ontologies floues dans le contexte de web sémantique, et vu la nécessité de décomposer une ontologie floue en plusieurs sous-ontologiesafin d optimiser le processus de raisonnementflou, un DF-ALC(Decomposingfuzzy ALC) est proposé dans ce papier. La contribution principale de ce travail est de décomposer les axiomes de l ontologie en sous-axiomes selon un degré de certitude affecté aux concepts et rôles flous, définir la syntaxe et la sémantique, proposer un algorithme de tableau local et une façon d utiliser les passerelles pour inférer entre les différentes TBox locales. Mots clés - Web Sémantique, Ontologie, Logiques de Description, logique floue, Inférence et raisonnement automatique 1. Introduction La prise en compte de la sémantique est également primordiale dans la recherche d informations et l évaluation de requêtes sur le Web. De nombreux travaux émanant de la communauté du Web sémantique ont été réalisés pour décrire la sémantique d'applications par la construction d ontologies. En effet, le Web Sémantique est un Web sur lequel les internautes et les chercheurs portent beaucoup d espoirs, Que ce soit dans le domaine de la recherche d informations, l e-business, Competitive Intelligence, etc., il aura pour mission de donner une signification aux données et de permettre aux machines d analyser et de comprendre les informations qui y circulent. Avoir un sens passe avant tout par la description de ces informations (créer des méta-données), puis essayer de les relier entre elles grâce à des règles d inférence et de déduction afin de construire des ontologies. Ces dernières sont alors centrales pour le Web sémantique qui, d une part, cherche à s appuyer sur des modélisations de ressources du Web à partir de représentations conceptuelles des domaines concernés et, d autre part, il a pour objectif de permettre à des programmes de faire des inférences dessus. Tolérer l incertitude et l imprécision ne peut être fait que par les systèmes flous. Nous avons besoin d un Web Sémantique qui peut nous fournir des garanties, et dans lequel on peut résonner en utilisantla logique. [TIM98], tels sont les propos de Tim Berners-Lee, fondateur et président du World Wide WebConsortium. Ou il essaye de nous montrer que toutes ces méta-données sont créées par l humain, et donc, elles devraient contenir beaucoup d incertitudes et d imprécisions, ce qui se répercutera sur la construction des ontologies.parce que la logique floue a été conçue pour trouver des solutions aux problèmes d imprécisions et d incertitudes de façon souple, les chercheurs ont eu l idée d intégrer cette logique dans le domaine duweb Sémantique en général et de l utiliser dans la construction des ontologies par la logique de description en particulier. les logiques de description sont un bon modèle pour décrire la sémantique des données du Web par des restrictions qui sont nécessaires afin d obtenir des algorithmes de raisonnement qui passent à l'échelle pour détecter des incohérences ou des corrélations logiques entre données ou sources de données, et pour calculer l'ensemble des réponses à des requêtes conjonctives, et d un autre coté ils sont très faibles lorsqu'on veut modéliser un domaine dont les connaissances et les informations sont vagues et imprécises. Pour cette raison il y avait beaucoup de propositions pour étendre les logiques de description par des théories mathématiques qui traitent l'incertain et l'imprécis, et comme résultat, c'est la naissance des logiques de description floues. Pour refléter nos objectifs, et après l exposition de notre motivation, cet article est organisé de la manière suivante. La section 3 donne des concepts de base sur la DL ALC et des préliminaires sur la DL floue et DL distribuée. La section 4 présente notre logique de description proposée. La section 5 détaille la méthode de raisonnement sur une DL floue et décomposé ;et enfin l article se termine par une conclusion et des perspectives. 2. Motivation La logique de description floue ne cherche pas la précision dans les affirmations, mais au contraire, elle cherche la réponse des propositions vagues, nécessitant une certaine incertitude (un flou). Par exemple, en logique classique, à la question : Est-ce que cette personne est grande? On ne peut répondre que par 174 IT4OD 2014

181 vrai, si c est le cas, ou faux dans le cas contraire. Avec la logique floue, on peut représenter les cas où la personne est très petite, moyennement petite, normale, pas très grande, grande, etc. Cependant, les problèmes de raisonnement considérés sur les ontologies ont souvent pris une position secondaire dans les bases de connaissances floues, en l occurrence dans la plupart des cas les chercheurs basent leurs efforts sur la méthode de représentation de connaissances incertaines en se concentrant sur les notions mathématiques et les théories des ensembles floues. Ces travaux existants, qui traitent le raisonnement dans les bases de connaissances floues, se contententde petites BC etencore sans donner d importance à l optimisation de l algorithme de raisonnement. Pour arriver à un traitement efficace pour les BC floues, on optera pour l organisation des connaissances en catégories d axiomes en fonction de certaines caractéristiques qui spécifient un sous domaine du domaine général de l ontologie. Les catégories spécifiquesreprésentent des sous-ensembles des axiomes composés par des concepts (rôles) flous de l ontologie.cette structuration serareprésentée par la logique de description distribuée, tandis que le raisonnement sera parallèle sur ces sous-catégories d axiomes, ce qui permettra de réduire l espace de recherche d une partet d autre part réduira le temps relatif au raisonnement. Dans notre contribution aussi, les axiomes peuvent être composé par des concepts (rôles) qui appartiennent à deux catégories différentes ; le flou sera représenté par une annotation liée à chaque concept et à chaque rôle, et sera traité par les notions des ensembles flous proposés par Zadeh.On noteraalors deux types d axiomes, axiomes intracatégories et inter-catégorie en utilisant la notion de pont. 3.Préliminaire 3.1 Logique de description Les logiques de description [BAA07]; [BAA91]; [BAA99]; [BAA11] ;[NUT97] forment une famille de langages de représentation de connaissances qui peuvent être utilisés pour représenter les connaissances d un domaine d application d une façon structurée et formelle. Une caractéristique fondamentale de ces langages est qu ils ont une sémantique formelle. Les logiques de description sont utilisées pour de nombreuses applications. Ils ont une base commune AL enrichie de différentesextensions : La logique de description ALC, objet du présent travail, rajoute la négationà AL et en fait ainsi une extension modale de la logique propositionnelle. D autres extensionsrajoutent la fermeture transitive de rôle, desrestrictions de nombres sur les rôles, la notion desous-rôle etc. Les logiques de description utilisent les notions de concept, de rôle et d'individu. Les concepts correspondent à des classes d'individus, les rôles sont des relations entre ces individus. Un concept et un rôle possèdent une description structurée définie à partir d'un certain ensemble de constructeurs. Dans les logiques de description on distingue deux niveaux de traitement: Niveau terminologique Tbox : le niveau générique (global) vrai dans tous les modèles et pour tout individu; Niveau assertionnelabox : fournit des instances desconcepts et des rôles Syntaxe Soit NC un ensemble de noms de concept et NR un ensemble de noms de rôle. L'ensemble d ALC-concept est construit par induction selon la grammaire suivante: C;D ::= A: Concept atomique Т: Le concept universel Top : Le concept vide Bottom : Négation d'un concept atomique C D: Conjonction de concepts C D: Disjonction de concepts r.c: Quantificateur universel r: Quantificateur existentiel non typé OùA NC et R NR Sémantique Une sémantique est associée aux descriptions des concepts et des rôles: les concepts sont interprétés comme des sous-ensembles d'un domaine d'interprétation Δ I et les rôles comme des sousensembles du produit Δ I x Δ I Une interprétation I est essentiellement un couple (Δ I ;. I ) où Δ I est appelé domaine d'interprétation et (. I ) est une fonction d'interprétation qui associe à un concept C un sous ensemble C I de Δ I et à un rôle r un sous-ensemble r I de Δ I x Δ I. En notation mathématique, elle est définie commesuit: Т I = ΔI I = Ø C = ΔI - CI (C D) I = C I I D I ( r.c) ={x Δ I / y : (x,y) r I y C I } I ( r.c) ={x Δ I / y : (x,y) r I y C I } 3. 2 Logique de description floue La logique floue est apparue en 1965 à Berkeley dans le laboratoire de Lotfi Zadeh [ZED65] avec la théorie des sous-ensembles flous. C est une extension 175 IT4OD 2014

182 de la théorie des ensembles classiques pour la prise en compte d'ensembles définis de façon imprécise. À l'inverse de la logique classique, la logique floue permet à une déclaration d'être en un autre état que vrai ou faux, la déclaration pourrait être vraie ou fausse pour un certain degré, qui est pris à partir d'un espace de vérité (ex. Mohamed est grand). On est incapable d'établir qu une déclaration est vraie ou fausse complètement à cause de l'implication de vague concept, Comme "grand", qui n'a pas une définition précise. Définition:Soit X un ensemble. Un sous-ensemble flou a de A est défini par une fonction d appartenance F A sur X à valeurs dans l intervalle [0,1]. Dans la logique floue, on distingue généralement trois logiques différentes: Lukasiewicz, Gödel, et la logique du produit; la logique populaire de Zadeh est une sous logique de Lukasiewicz. Ces logiques proposent différents opérateurs pour la conjonction, la disjonction, la négation et l'implication. Ils sont présentés dans le tableau 1. Fonction x y x y x y x Lukasiewicz max(xx+yy 1,0) min(xx+yy,1) min(xx+yy,1) 1 xx 1 iiiiii yy Gödel Min(x,y) Max(x,y) yyyyyyyyyyyy 1 iiiiii yy Produit max(xx+yy 1,0) min(xx+yy,1) xx/yyyyyyyyyyyy 1 iiiiii = yy 0 ssssssssss 1 iiiiii = yy 0 ssssssssss Zadeh Min(x,y) Max(x,y) Max(1-x,y) 1-x Les Logiques de description floues (FuzzyDLs) sont des extensions des logiques de description classiques, elles ont été proposées en tant que langages pouvant représenter et raisonner sur des connaissances vagues ou imprécises[bob11],[str13]. Ces extensions ont gagné une attention considérable ces dernières années, d'une part parce qu elles sont indispensables pour les applications qui sont par nature imprécises, comme l analyse multimédia, les applications géospatiales et bien plus d autres, soit : d'autre part elles peuvent être appliquées aux applications du Web sémantique, comme la représentation des ontologies qui modélisent des domaines dont les connaissances sont imprécises. Plusieurs extensions floues pour les logiques de description ont été proposées comme des formalismes capables de capter et de raisonner sur des connaissances imprécises et vagues [STO05], [STR06]. Afin de supporter les services d'inférence de DL floue, différents algorithmes du raisonnement ont été proposés, tels que des algorithmes basés sur les tableaux [STO07],[STR01], ainsi que des techniques qui réduisent le raisonnement de DL floue à un raisonnement de DL classique [STR04],[STR06]. 3.3 Logique de description distribuée Borgida et Serafini proposent une Logique de Description Distribuée (LDD) [BOR03], qui généralise la logique de description, avec une sémantique à modèles locaux pour représenter les bases de connaissance (ontologies) dans les environnements distribués, ainsi que d assurer un raisonnement sur ces KB. En utilisant le même principe quedistributed First OrderLogics(DFOL), les logiques de description distribuées (DDL)permettent de relier et de raisonner avec de multiples ontologies sur le Web sémantique. Lespasserelles en logiques de description distribuées se restreignent aux relations entre concepts,rôles et individus d ontologies différentes. Leur sémantique permetavant tout de déduire des relations de subsomption qu une ontologie seule ne permet d obtenir. Syntaxe Un réseau d ontologies en DDL est composé de diverses bases de connaissances en logique de description, dont la syntaxe a été présentée ci-dessus. Par ailleurs, les ontologies sont reliées entre elles par le biais de passerelles. Celles-ci sont représentées, dans la syntaxe abstraite comme suit. Définition : Soient Oi et Oj deux ontologies. Une passerelle de Oi vers Oj(i j), est une expression de l une des formes suivantes : i :X j :Y estunerègle intra (into-bridge rule) ; i :X j :Y estunerègle extra (onto-bridge rule) ; i:a j:b est une correspondance d individus(individualcorrespondence). où i :X et j : Y sont soit des concepts soit des rôles de Oi et Oj respectivement et i : a est unindividu de Oi et j : b est un individu de Oj. Sémantique Dans un réseau d ontologies en DDL, on affecte à chaque ontologie une interprétation en logique de description. A chaque nœud de réseau peut exister plusieurs logiques de description. Pour relier les connaissances de deux ontologies, DDL utilise les relations de domaines. Définition : Soit Δ i et Δ j deux domaines d interprétation. Une relation de domaine homogène r ij de i vers j est un sous-ensemble de Δ i XΔ j. Pour tous d Δ Ii, on utilise r ij (d) pour dénoter l ensemble {d Δ Ij «d,d» r ij }, pour tout D Δ Ii, on utilise r ij (D) pour dénoter l ensemble U d D r ij (d) et pour tout R Δ Ii XΔ ij r ij ( R) dénoteu <d,e> R r ij (d)x r ij (e) Une relation de domaines rij représente une manière possible d apparier les éléments de Δ i avec des éléments de Δ j, selon le point de vue de j.definition : Une relation de domaine r ij satisfait une 176 IT4OD 2014

183 passerelle homogène vis-à-vis de deux interprétations locales Ii et Ij (noté <Ii; rij ; Ij>= rp) si et seulement si: <I i ; r ij ; I j > i :X j : Y si et seulement si rij(x Ii ) Y Ij, <Ii; r ij ; I j > i :X j : Y si et seulement si rij(x Ii ) Y Ij, <Ii; r ij ; I j > i :a j : b alors <a Ij, b Ij > rij Une relation hétérogène indique une association entre un élément d un concept et la réification d une relation, donc elle associe un objet du domaine à une paire d objets. Définition :(Relations de domaine hétérogènes) Soient Ii et I j deux interprétations. Une relation de domaine concept-rôle cr ij de i vers j est un sous-ensemble de ΔiX j. Une relation de domaine rôle-concept rc ij de i vers j est un sous-ensemble de i XΔ j. La relation rc ij représente une manière possible de réifier des relations entre objets, tandisque la relation représente le processus inverse. cr ij Dans notre travail nous allons introduire la notion du flou en se basant sur le formalisme des LDs floues. Les LDs classiques sont interprétées grâce à des concepts ensemblistes classiques : ensemble, relation binaire, appartenance, etc. les extensions floues des LDs ont une sémantique exprimée grâce à la théorie des sous ensembles flous : Alors que dans la théorie des ensembles classiques, un élément appartient à un ensemble ou n y appartient pas, par contre en théorie des sous ensembles flous, un élément appartient à l ensemble avec un certain degré. Plus formellement, soit X un ensemble d'éléments, un sous-ensemble flou A de X, est défini par une fonction d'appartenance μa (x), Cette fonction affecte tout x X à une valeur comprise entre 0 et 1 qui représente le degré dont lequel cet élément appartient à X. Les LDs floues diffèrent entre elles principalement par le moyen par lequel elles introduisent le flou c'està-dire par les éléments syntaxiques (constructeurs, axiomes, assertions) pour lesquels l interprétation classique s avère insuffisante. 4. DF-ALC 4.1. Décomposition L explosion du nombre de sources d informations accessibles via le Web multiplie les besoins de techniques permettant le raisonnement dans ces sources. La décomposition d ontologies représente un thème de recherche très important dans les bases des connaissances floues puisqu il permet une multi représentation de connaissances selon les degrés de certitude, la chose qui nous permet d optimiser le raisonnement d un autre coté. Le raisonnement dans une large ontologie floue peut être réduit à quelques procédures du raisonnementdans les sous-axiomes de l ontologie globale, en se basant sur lesservices d inférencede l algorithme de tableaux (satisfaisabilité, subsomption). Notre approche sera appliquée sur la logique de description ALC. Une décomposition de TBoxfuzzy-ALC est présentée avecles règles de formation suivantes : Un axiome est dans une des formes : i : Cx1 D x2 i : C x1 D x2 i : R x1 S x2 où C et D sont des expressions de concept, R et S sont des noms de rôle, x1,x2 [0,1]. Une expression de concept est dans une des formes : i : CN Т i : C i : C U D i : C D i : R.C i : R.C Une règle de pont est dans la forme : i :C x1 j : C x2 / x1,x2 [0,1] x2 x1. C une expression de concept. Dans les sections suivantes, deux approches du raisonnement pour les ontologies décomposantesseront présentées avec plus de détail, les techniques sont parallèles et distribuées. Lesalgorithmes de tableaux seront appliqués sur des ontologies locales soit fusionnés dansle cas parallèle ou soit propagés dans le cas distribué TBox et ABox de DF-ALC La plupart des logiques de description sont composées d'une TBox, qui représente la terminologie d'un domaine, et d'une ABox qui déclaredes individus particuliers dans ce domaine.notre travail consiste aussi à introduire la notion du floue dans ces deux composantes, en y ajoutant un degré de certitude pour les axiomes terminologique (TBox) et d autre part un degré d appartenance des individus flous (ABox). Par ailleurs, la construction d une logique de description devra prendre en considération les points suivant : UneTbox est composée par des concepts et rôles atomiques. Les concepts sont divisés en deux types : - Les noms de concept C N qui apparaissent dans le côté gauche de l axiome. - Les concepts de base CB qui apparaissent dans le côté droit. Les noms de concept peuvent apparaitre une seule fois dans la partie gauche d un axiome de la même sous-ontologie. 177 IT4OD 2014

184 Les concepts atomiques qui définissent les axiomes peuvent être imprécis. Les axiomes définis dans la logique de description ALC ne sont pas nécessairement justes dans DF-ALC DF-ALC peut représenter l imprécision des concepts atomiquespar une propriété floue qui prend une valeur dans l intervalle [0,1], cette propriété ne nécessite pas une grande modification lors de l extension de la syntaxe de la logique de description classique. La théorie des sous-ensembles floues proposée par Zadeh qui est utilisée pour le calcul du degré de certitude des concepts d une opération de conjonction, disjonction ou négation. Plus latbox floue proposée dans DF-ALC, l imprécision apparait aussi dans ABox avec un degré d appartenance de l individu au concept (Ex : C I (x)=a / a [0,1]), de telle sorte que les assertions dans ABox peuvent être représentéessous la forme suivante : C a (x) b tel que : Le concept «C» devrait être satisfait dans TBox. L instance «x» appartient au concept «C» dans ABox Syntaxe et sémantique de DF-ALC Nous allons maintenant entrer dans les détails de la définition formelle de la logique de description DF- ALC: sa syntaxe et sa sémantique. Pour faciliter la lecture,soit respectivementa,c et R l ensemble de concept atomique, concept complexe et Rôle. Table:Syntaxe et sémantique des constructeurs DF- ALC Constructor Syntax Semantics Top T I Bottom Φ I Atomic concept Aa A a I I Atomicrole R a R a I x I I Conjunction C a D b (C D) [ Min(a,b) ] I Disjunction C a UD c (CUD) [Max(a,b)] I Negation C a C [1-a] Universalquantification R a.cb Inf{max{1- Ra,C b }} existential quantification R a.cb Sup{min{Ra,C b }} Une interprétation floue est une paire I = (Δ I,. I ), où Δ I appelé domaine, tandis que. I une fonction d interprétation qui associeun concept/rôleflou à un sous ensemble A I de Δ I avec un degré d appartenance C : Δ I [0,1] / R : Δ I x Δ I [0,1]. La fonction d interprétation dans DF-ALC doit satisfaire les équations suivantes pour tout d Δ I Т I (d) = 1 ; Ι (d) = 0; I C (d) = µ(c(d)); I (C D) (d) = min(µ(c(d), µ(d(d)); (C U D) I (d) = max(µ(c(d), µ(d(d)); C I (d) = 1- µ(c(d)); R.C(d) = Inf d Δ {max{1-µ(r(d,d ),C(d )}}; R.C(d) = Sup d Δ {min{µ(r(d,d ),C(d )}}. Une collection des logiques de description, ou k est un ensemble non vide des index pour chaque k K, unetboxґ k est présenté dans une concrète LD k. Afin de distinguer des descriptions dans chaque TboxҐ k, nous mettons dans l en tête des descriptions l indice de leurs TBox. Par exemple k : C a dénote un concept C de{ld k } k K avec un degré de certitude «a». Les morphismes sémantiques entre TBox sont présentés en utilisant les règles des ponts. 5. Le raisonnement avec DF-ALC Les systèmes de LD fournissent aux utilisateurs plusieurs capacités d inférence. Le raisonnement permet d inférer des connaissances implicites à partir des connaissances explicites stockées dans la base de connaissances. L inférence élémentaire sur les expressions de concepts dans la LD est la subsomption entre deux concepts qui détermine les relations de sous-concept / super-concept. L inférence élémentaire sur les individus est de déterminer si un individu donné est une instance d un certain concept. La recherche du raisonnement dans la LD DF-ALC est un nouveau défi pour les grandes bases de connaissances floues. Dans cette partie de l article on va essayer de présenter quelques techniques de raisonnement pour une BC floue décomposée. Les algorithmes de tableaux réduisent le problème de subsomption au problème de satisfaisabilité. En fait, on sait que C D si et seulement si C D est non satisfaisabilité. L algorithme de tableau flou traite chaque sous axiome (chaque sous ontologie) d une façon séparée, il commence à chaque fois par une ABox A ={C a (x) b } pour vérifier l insatisfaisabilité de concept C a Raisonnement pour unetbox local La méthode de tableau proposée dans notre travail suit le même processus déjà utilisé dans les travaux précédents et qui permettre d'établir si une description incluant les aspects assertionnels et terminologiques peuvent admettre un modèle. Le tableau initial est construit comme suit. Pour tout fait du niveau assertionnel C a (x) b,on ajoute dans le tableau T={C a (x) b }. Nous construisons un arbre de tableaux. La racine appelée tableau initial est réduite à la formule ellemême. On construit des successeurs pour un tableau T à l'aide des règles présentées ci-dessous : Les feuilles de cet arbre sont : 178 IT4OD 2014

185 soit des tableaux contradictoires : ils contiennent une paire de formules p et p. soit des tableaux complets : ils ne sont pas contradictoires et aucune règle ne leur est applicable. Règles Conditions Résultat -rule -(C a D b )(x) c T T =T U {Ca(x) c, C b (x) c -{C a (x) c, C b (x) c } T } -rule -(C a UD b )(x) c T T =T U {Ca(x) c } ou -{C a (x) c, C a (x) c } T T =T U {C a (x) b } -rule -( R b.c a )(x) c T T =T {Rb(x,y) c,c a (y) c } I -R b (x,z) c, C a (z) c T / z Δ -rule -{( R b.c a )(x) c,r b (x,y) c } T T = T {Ca(y) c } -Ca (y)c T La construction d'un algorithme de tableau local s'arrête soit lorsqu'on rencontre un tableau complet dans ce cas la formule est satisfaisable, soit lorsque toutes les feuilles sont des nœuds contradictoiresalors la formule n'est pas satisfaisable. Il s'agit alors de spécifier ce qu'est un tableau contradictoire. Les cas possibles sont : Le tableau contient une formule (x) ; Le tableau contient une paire de formules C(x) et C(x). Nécessairement C est un concept primitif. Un conflit flou comme C 0 (x)=a ; Soit T un tableau complet, M[T] le modèle construit, on a: Δ M[T] est l'ensemble des éléments apparaissant dans T M[T] Soit x un élément, x = x M[T] Soit A un concept primitif, x A ssi A(x) T. M[T] Soit r un rôle primitif, (x,y) r ssi a(x,y) T. La terminaison de cette méthode n'est pas garantie si on ne met pas en oeuvre une stratégied'application des règles. Pour cela nous introduisons quelques notations. Appelons père d'une variable x, un élément ytel que ai(y,x) pour a i rôle atomique. A la création, une variable n'a qu'un seul père et parexamen des règles on vérifie qu'elle ne peut acquérir un deuxième père. Par examen des règles, la taille de la plus grande formule de concept d'une variable esttoujours strictement plus petite que la taille de la plus grande formule de concept de son pè même la taille de la plus grande formule de concept d'un individu est toujours inférieureou égale à la taille de la plus grande formule de concept du tableau initial. Enfin on remarque que toute formule de concept qui apparaît est une sous-formule d'uneformule du tableau initial. Donc notre stratégie pour une sous ontologie est comme suit: 1. Appliquer une règle à une variable uniquement si aucune règle n'est applicable à cesancêtres. 2. Pour un élément, appliquer prioritairement les règles locales (R ou R ). 3. Pour un élément, appliquer la règle génératrice (R ) si les règles locales nesont pas applicables. 4. Pour un élément, appliquer la règle de propagation (R ) si les règles locales,génératrice ne sont pas applicables. Le point 1 garantit que lorsqu'on applique une règle à une variable, cette variable nedisparaîtra plus du tableau. En effet, par examen des règles, nous vérifions qu'aucune d entre elles n estplus jamais applicable aux individus et à leurs ancêtres. Par conséquent lors de l'évolution dutableau seules les variables qui sont feuilles d'un arbre et sur lesquelles nous n'avons appliquéaucune règle peuvent disparaître. Nous nous intéressons donc uniquement aux éléments définitifs du tableau puisque les autres ne sontpas sujet à une application d'une règle Raisonnement distribué Dans cette section, nous introduisons un algorithme de raisonnement basé sur l idée principalede la procédure de raisonnement distribué de Luciano Serafini, Andrei Tamilinet Lepham[SER04a], [SER04b],[TLP07] qui prend un concept complexe C comme entrée et retourne le résultat de son essaide (in)satisfaisabilité. Nous dénotons une TBoxdécomposante par T = [{Ti},B]. Une TBox distribuée se compose d une TBox source et des TBox cibles. Toutefois, ceci peut être déterminé quand on exécute le raisonnement, c.-à-d, si une requête est posée sur la TBox composante Ti alors Ti est assignéà la TBox source et les autres deviennent les TBox cibles. L idée principale est d abord de trouver la TBox locale source compatible avec la requête, et d essayer de construire un arbre complet en suivant les étapes de l algorithme du tableau défini dans la section 5.1. Une fois arrivéà la fin d exécution de ces étapes, en parcourant les nœuds générés et en cherchant les branches ouvertes de T s, nous devons vérifierl existence des règles de ponts identiques entre T s et les autres TBox cibles. Notons que les règles de pont prennent une seule direction. Si une règle de pont complet est trouvée, nous devons affecter les éléments de la branche ouvertde T s à destbox cibles, et nous devons à nouveau appliquer l algorithme du tableau local et ainsi de suite. Ceci signifie que nous pouvons initialement traiter une requête posée sur certaine TBox. Si nous commençons sur T 1, alors T 1 est vu comme la TBox source et T 2, T 3.T n les TBox cibles. Donc pour pouvoir détecter rapidement les contradictions locales, nous devons initialement exécuter sur les TBox cidessus cités. Nous obtiendrons les deux cas suivantes : 1. Soit T1(x) ou T 2 (x) ou ou T n(x) est insatisfaisable (c-à-d tous les nœuds de feuille de 179 IT4OD 2014

186 T 1 (x) ou T 2 (x) ou ou T n(x) sont des contradictions) alors nous concluons que x est également insatisfaisable par rapport a T générale. 2. Soit tous les Ti sont satisfaisables (C-à-d il existe au moins une feuille non contradictoire dans une T i ), alors nous appliquons l algorithme de tableau sur T 2,T 3,, T n pour les nœuds ouverts de T 1 en utilisant les règles des ponts identiques. Pour appliquer une règle de pont identique le degré de certitude dans l axiome de TBox cible est plus grand que le degré de certitude de l axiome de TBox source. 6.Conclusion L intégration de la notion de flou dans la logique de description nécessite l extraction de sous axiomes de l axiome général afin d avoir un degré de certitude exact de concept définià partir des autres concepts atomiques ; Cette extraction conduit à un accroissement de la taille de base de connaissances, ce qui conduità une augmentation du temps d exécution et à un gaspillage de stockage lors du raisonnement, pour cela nous avons pensé à proposer une décomposition des axiomes afin de garantir une présentation sure et un raisonnement performant. Notre proposition peut servir de support à l amélioration d un raisonneur qui supporte la logique de description ALC, ce raisonneur va traiter deux choses à la fois, le flou du concept et le rôle d un côté et les inférences inter et intra TBox de l autre. Ce raisonneur peut optimiser aussi le raisonnement de la logique de description ALC classique si nous savons que le degré de certitude devra être toujours 1. Les perspectives de ce travail sont doubles : La première est de finaliser le développement d un raisonneur qui supporte ce genre de logique de description, la seconde c est d essayer de projeter ce travail sur les autres DLs les plus efficaces dans l expressivité. 7. Références [BAA91] Baader, F. et P. Hanschke. A schema for integrating concrete domains into conceptlanguages. In Proc. of the 12th Int. (IJCAI 91),1991 [BAA99]Baader, F. et U. Sattler. Expressive number restrictions in description logics.journalof Logic and Computation 9(3), [BAA01]Baader, F. et U. Sattler. An overview of tableau algorithms for description logics.studialogica 69(1), [BAA05] Baader, F., I. Horrocks, et U. Sattler. Description logics as ontology languages for thesemantic Mechanizing Mathematical Reasoning, [BAA07] Baader, F., D. Calvanese, D. L. McGuinness, D. Nardi, et P. F. Patel-Schneider. TheDescription Logic Handbook : Theory, Implementation, and Applications [BAA11] Baader, F., What s new in description logics. Informatik-Spektrum34 (5), [BOR03] Borgida., A. et Serafini. L. "Distributed description logics: Assimilating information from peer sources". Journal of Data Semantics, Vol. 1, [BOB11] Bobillo, F., Straccia, U. Fuzzy ontologyrepresentation using OWL 2.International Journal of Approximate Reasoning 52, [KRÖ12] Krötzsch, M. OWL 2 profiles: An introduction to lightweight ontologylanguages. In: Eiter, T.,Krennwallner, T. Vol of Lecture Notes in ComputerScience. Springer, pp [NUT97] Nutt,W., F. M. Donini, M. Lenzerini, et D. Nardi. The complexity of concept languages. Inf. Comput [SER04a] L. Serafini and A. Tamilin. Drago : Distributed reasoning architecture for the semanticweb. Technical Report T ,ITC-irst, [SER04b] L. Serafini and A. Tamilin. Local tableaux for reasoning in distributed descriptionlogics. In Description Logics Workshop 2004, CEUR-WS Vol 104, [STO05] Stoilos G. [et al.] Fuzzy OWL: Uncertainty and the semantic web. In Proc.of the InternationalworkshoponOWL: (OWLED) [STO07]Stoilos G. [et al.] Reasoning with very expressive fuzzy description logics.journal of Artificial Intelligence Research. - Vol [STR04] Straccia U Transforming fuzzy description logics into classical description Procof the 9th European Conference on Logics in Artificial Intelligence (JELIA-04), [STR06] Straccia U Answering vague queries in fuzzy DL-Lite.In Proc of the 11th (IPMU-06) [STR13] Straccia, U., Foundations of Fuzzy Logic and SemanticWebLanguages.CRC Studies in Informatics Series. Chapman & Hall [TIM98] Tim Berners-Lee. Web design issues ; What a semantic can represent [TLP07] T. L. Pham and N. Le-Thanh. Some approaches of ontology decomposition indescription logics. In Proceedings of the 14th ISPE, July2007. [ZED65] ZADEH, Fuzzy sets [Revue] // Information and Control.. - Vol IT4OD 2014

187 Common-Knowledge, Communication and Cooperation Management Epistemic Approach to Cooperative Management Takashi Matsuhisa Institute of Applied Mathematical Research Karelia Research Centre, Russian Academy of Science 11, Pushkinskaya str. Petrozavodsk, Karelia, , Russia Abstract Issues of moral hazard and adverse selection abound in each and every contract where one has a self interest and information that the other party does not possess. While this is a fertile research area, there is still need for more information on how you handle a party to a contract with more information than you. The moral hazard is very often the bottleneck, and the buyer-supplier cooperation is an epitome. This paper re-examines the issue in the framework of a principal-agent model under uncertainty. It highlights epistemic conditions for a possible resolution of the moral hazard between the buyer and the suppliers. We show that the moral hazard disappeared in the principal-agent model under uncertainty if the buyer and suppliers commonly know each agent s belief on the others efforts, or if they communicate their beliefs on the others efforts to each other through messages. Keywords Common-Knowledge; Effort level; Moral hazard; Formal and real authority communication; Principal-agent model under uncertainty. I. INTRODUCTION Issues of moral hazard and adverse selection abound in each and every contract where one has a self interest and information that the other party does not possess. While this is a fertile research area, there is still need for more information on how you handle a party to a contract with more information than you. The Global Financial Crisis is an epitome of the moral hazard: managers and employees as agents and shareholders as principals. In fact, still perplexes bankers and shareholders alike: shareholders are still having problems with how they can handle their agents, while on the other hand insurers and bankers are struggling to structure products that will reduce the impact of moral hazard. Such moral hazards are the bottlenecks in buyer supplier cooperation, and the buyersupplier management is another epitome. The first formal analysis of the principal-agent relationship and the phenomena of moral hazard was made by Arrow [2]. The many sided moral hazard can arise when there are many agents that affect gross returns and their individual actions are not observed by each other, and especially, the treatment of the principal-agent model with many sided moral hazard was given by Holmstrom [6]. He formalized it as the issue in a partnership model whether there exist any sharing rules that both balances the budget and under which an efficient action is a Nash equilibrium. Holmstrom [6] and Williams and Radner [12] respectively analyzed the conditions for existing the sharing rule such that some actions profile satisfies the first-order conditions for an equilibrium. Recently, Matsuhisa [8] and Matsuhisa and Jiang [9] adopted a new approach to the many sided moral hazard from the epistemic model point of view developed by Aumann [3] and his followers in game theory, they analyzed the moral hazard as the disagreement on expected marginal costs between the principal and agents in an extended model of principal and agents under uncertainty. He gave a neccesary condition that the moral hazard will not be appeared; that is, under some technical assumptions, the principal and agent model under uncertainty disappears the moral hazard if the principal and agents could share fully information about their expected marginal costs in the following two cases: first they commonly known the marginal expected costs (Matsuhisa [8]), and secondly they communicate the costs as long run (Matsuhisa and Jiang [9]). However, in the papers they assume the existence of decision function consistent to the technical assumptions, and it has not been guaranteed. This paper aims to remedy the defect. We re-examine a buyer-supplier cooperation with moral hazard as a problem of the principal-agent relationship. We present an extended principal-agent model under uncertainty, and we highlight hidden conditions for a possible resolution of the moral hazard between the buyer and the suppliers. For removing out such moral hazard in the buyer-supplier cooperation, our recommendation is that the principal and the agents should share fully information about only their conjectures on the others effort levels but not about expected marginal costs by making common-knowledge on the efforts or by communicating their beliefs on these. Let us consider that there are the buyer (as principal) and suppliers (as agents) more than two: The buyer manufactures the productions made of parts supplied by the suppliers with paying their costs, and he/she gets a profit by selling the productions. Assume that the buyer and all suppliers aim to maximize each gross return independently. The moral hazard arises that there is not the sharing rule so that the buyer makes a contract with every supplier such that the total amount of all profits is refunded to each supplier in proportion to the supplier s contribution to the productions, i.e.; the expected costs are not equal between the buyer and the suppliers. 181 IT4OD 2014

188 To investigate the phenomenon in detail we shall extend the principal-agent model with complete information to the principal-agent model with incomplete information. Now we assume each agent as well as the principal, k, has the below two abilities on knowledge: T each k cannot know something when it does not occurs; 4 each k knows that he/she knows something. 5 If k cannot know something then he/she has to know that he/she does not know it. This structure is induced from a reflexive, transitive and symmetric binary relation associated with the multi-modal logic S5. We focus on the situation that the principal and the agents interact each other from sharing information on the below: PR CK CM The refunded proportional rates to suppliers are functions of each supplier s effort level; Both the buyer and suppliers commonly known each agent s belief on the others efforts; The buyer and suppliers communicate each other about their beliefs on expected efforts of the others as messages through the communication graph. In this line we can show: Theorem 1. Under the above conditions PR, if either CK or CM is true then all effort levels such that the expected marginal costs actually coincide for buyer and suppliers can be characterised as the critical points of the refunded proportional rate function. Consequently, if the refunded proportional rate is constant then all marginal costs have to coincide each other; i.e., there is no moral hazard. The paper is organized as follows: Section II reviews the moral hazard in the classical principal-agent model (i.e.: the principal-agent model with complete information) following Matsuhisa [8]. Section III recalls the formal model of knowledge and common-knowledge, and presents the principal-agent model under uncertainty,. We give an illustrative example of our contract design problem in the principal-agent model under uncertainty. Section IV states Theorem 1 for CK with the proof. Section treats the contract design problem without common-knowledge assumption. We present a communication model, and give Theorem 1 for CM with proof. Finally we conclude remarks. II. MORAL HAZARD This section recalls a marl hazard in the the principalagents model following Matsuhisa [8] (Section 2). Let us consider the principal P and n agents {1, 2,, k,, n} (n 1) in a firm. The principal makes a profit by selling the productions made by the agents. He/she makes a contract with each agent k that the total amount of all profits is refunded each agent k in proportion to the agent s contribution to the firm. Let e k denote the measuring managerial effort for k s productive activities, called k s effort level or simply k s effort with e k R +. Let I k (x k ) be a real valued continuously differentiable function on R +. I k (x k ) is interpreted as the profit by k s effort e k selling the productions made by the agent k with the cost c(e k ). Here we assume I k (x k) 0 and the cost function c( ) is a real valued continuously differentiable function on R +. Let I P be the total amount of all profits: I P (x) = I P (x 1, x 2,, x k,, x n ) = n k=1 I k(x k ). The principal P cannot observe these efforts e k, and shall view it as a random variable e k on a probability space (Ω, µ); i.e., e k is a µ-measurable function from Ω to R +. We introduce the ex-post expectation: Exp[I P (e)] := ξ Ω I P (e(ξ))µ(ξ) and Exp[I k (e k )] := ξ Ω I k(e k (ξ))µ(ξ). The optimal plan for the principal then solves the following problem: n Max e=(e1,e 2,,e k,,e n ){Exp[I P (e)] Exp[c(e k )]}. k=1 Let W k (e k ) be the total amount of the refund to agent k: W k (e k ) = r k I P (e), with n k=1 r k = 1, 0 r k 1, where r k denotes the proportional rate representing k s contribution to the firm. The optimal plan for each agent also solves the problem: For every k = 1, 2,, n, Max ek {Exp[W k (e k )] Exp[c(e k )]} subject to n r k = 1, 0 r k 1. k=1 We assume that r k is independent of e k, and the neccesary conditions for critical points are as follows: For each agent k = 1, 2,, n, we obtain e k Exp[I k (e k )] Exp[c (e k )] = 0 and r k e k Exp[I k (e k )] Exp[c (e k )] = 0. in contradiction to 0 r k 1 because c (e k ) = e k Exp[I k (e k )] = r k e k Exp[I k (e k )] This contradictory situation is called a moral hazard in the principal-agents model; i.e., there is no equilibrium effort level as a solution of the contract design problem. III. THE MODEL Let N be a set of finitely many agents and let k denote an agent and P the principal. The specification is that N = {P, 1, 2,, k,, n} consists of the principal P and the agents N = {1, 2,, k,, n} in a firm. A state-space Ω is a non-empty finite set, whose members are called states. An event is a subset of the state-space. We denote by 2 Ω the field of all subsets of it. An event E is said to occur in a state ω if ω E. A. Information and Knowledge. By partitional information structure, we mean Ω, (Π i ) i N in which Π i : Ω 2 Ω satisfies the two postulates: For each i N and for any ω Ω, Ref Trn Sym ω Π i (ω); ξ Π i (ω) implies Π i (ξ) Π i (ω). If ξ Π i (ω) then ω Π i (ξ). The set Π i (ω) will be interpreted as the set of all the states of nature that i knows to be possible at ω, or as the set of 182 IT4OD 2014

189 the states that i cannot distinguish from ω. We call Π i (ω) i s information set at ω. Definition 1. The S5n-knowledge structure is a tuple Ω, (Π i ) i N, (K i ) i N that consists of a partition information structure Ω, (Π i ) i N and a class of i s knowledge operator K i : 2 Ω 2 Ω defined by K i E = {ω Π i (ω) E }. The event K i E will be interpreted as the set of states of nature for which i knows E to be possible. We record the properties: For every E, F of 2 Ω, N K i Ω = Ω; K K i (E F ) = K i E K i F ; T K i E E; 4 K i E K i (K i E); 5 Ω \ K i E K i (Ω \ K i E). B. Common-Knowledge. Let S be a subset in N. The mutual knowledge operator among a coalition S is the operator K S : 2 Ω 2 Ω defined by the intersection of all individual knowledge: K S F = i S K i F, which interpretation is that everyone knows E. Definition 2. The common-knowledge operator among a coalition S is the operator KC S : 2Ω 2 Ω defined by KC SF = n N(K S ) n F. An event E is common-knowledge among S at ω Ω if ω K S C E. The intended interpretations are as follows: KS C E is the event that every agent in S knows E and every agent in S knows that every agent in S knows E, and every agent in S knows that everyone knows that every agent in S knows E,, and so on so on. C. Principal-Agent Model under Uncertainty. Let us reconsider the principal-agent model and let notations and assumptions be the same as in the above section. We shall introduce the extended principal-agents model. Definition 3. By a principal-agent model under uncertainty we mean s structure M = N, (Ω, µ), (e k ) k N, (I k ) k N, I P, (r k ) k N, (c k ) k N, (Π k ) k N in which 1) N = {P, 1, 2,, k,, n} where P is the principal and each k is an agent,(ω, µ) is a probability space; 2) e k is a random variable on Ω into R + with e k (ω) a real variable in R + ; 3) I k (x k ) is an agent k s profit function with I k (e k ) the profit by his/her effort e k, which is sufficiently many differentiable on R + with I k 0, and I P (x) = I P (x 1, x 2,, x n ) = n k=1 I k(x k ) is the profit function of the firm (the total amount of all the agents profits); 4) r k is a proportional rate function in the contract, which is sufficiently many differentiable and weakly increasing on R + with 0 < r k 1 for k = 1, 2,, n; 5) c k is the cost function for agent k, which is sufficiently many differentiable on R + with I k 0 with c k (e k ) interpreted as the cost of k for effort level e k ; 6) (Π k ) k N is a non-partition information structure satisfying the three postulates Ref. Trn and Sym. For each e k R + and e = (e 1, e 2,, e k,, e n ) R n + let us denote by [e k ] the event of k s effort [e k ] = {ξ Ω e k (ξ) = e k } and by [e] the event of total efforts [e] = k N [e k ]. For any non-empty subset S of N, we will denote [e S ] = k S [e k ], and [e k ] = l N\{k} [e l ] D. Bayesian approach. According to this we have to assume that each agent k know his/her own effort e k but k cannot know the others effort e k, and also the principal P cannot know efforts for any agents. The former assumption can be formulated as KE [e k ] K k ([e k ]) for every e k R +. The later assumption means that the principal cannot have the exact knowledge on the agents effort levels e and also each agent cannot have the exact knowledge on the others s effort e k R n 1 +. E. Belief and Conjecture. Following the interpretations we have to introduce the notion of belief on the others effort level: By the principal P s belief on the agents efforts e we mean a probability q P (e) of e, and by an agent k s belief on the other agents effort e k we mean a probability q k (e k ) of e k. The conjecture q P (e; ω) of the principal P for the agents effort e k R (k N) is defined by q P (e; ω) = µ([e] Π P (ω)), and the conjecture q k (e k ; ω) of agent k for the other agents effort e k R n 1 + is q k (e k ; ω) = µ([e k ] Π k (ω)). By the event of P s belief on the agents efforts e, we mean [q P (e)] := {ξ Ω q P (e; ω) = q P (e) }, and by the event of k s belief on the other agents efforts e k we mean [q k (e k )] := {ξ Ω q k (e k ; ω) = q k (e k ) } It should be noted by KE that q k (e k ; ω) = q k (e; ω) and so [q k (e k ; ω)] = [q k (e; ω)]. F. Interim expectation. By the interim expectation (or simply expectation ) of I P we mean Exp[I P (e) Π P ](ω) := ξ [e] I P (e 1 (ξ), e 2 (ξ),, e n (ξ))µ(ξ Π P (ω)) = I P (e)q P (e; ω) and by the interim expectation (or simply expectation ) of I k we mean Exp[I k (e k ) Π k ](ω) := I k (e k )q k (e k ; ω) = I k (e k (ξ))µ(ξ Π k (ω)) ξ [e k ] and the interim expectation of agent k s income W k is Exp[W k (e k ) Π k ](ω) := r k (e k )Exp[I P (e k, e k ) Π k (ω)] = r k (e k )I P (e k, e k )q k (e k ; ω) e k E k 183 IT4OD 2014

190 G. Contract design problem. We will treat the maximisation problems as for the optimal plans for the principal and agents: To find out effort levels e = (e 1, e 2,, e k,, e n ) R n + such that, subject to n k=1 r k = 1, 0 < r k 1, PE Max e=(ek ) k=1,2,,n {Exp[I P (e) Π P (ω)] n k=1 Exp[c k(e k )]}; AE Max ek {Exp[W k (e k ) Π k (ω)] Exp[c k (e k )]}. Example 1. Let us consider the principal-agent model under uncertainty as follows; N = {P, 1, 2}: Ω = {ω 1, ω 2, ω 3, ω 4 }, with each state is interpreted as the two types of effort levels {H, L} for agent 1 and effort levels {h, l} for agent 2 given by the table 1: µ is the equal probability measure on 2 Ω ; i.e., µ(ω) = 1 4 : The information partition (Π i ) i=p,1,2 are: The partition Π P on Ω: Π P (ω) = Ω. The partition Π 1 on Ω: Π 1 (ω) = {ω 1, ω 2 } for ω = ω 1, ω 2, Π 1 (ω) = {ω 3, ω 4 } for ω = ω 3, ω 4. The partition Π 2 on Ω: Π 2 (ω) = {ω 1, ω 3 } for ω = ω 1, ω 3, Π 2 (ω) = {ω 2, ω 4 } for ω = ω 2, ω 4. e i : Ω R + with e i (ω) a real variable is defined by e 1 (ω) = x h for ω = ω i (i = 1, 2), e 1 (ω) = x l for ω = ω i (i = 3, 4) with x h x l, e 2 (ω) = y h for ω = ω i (i = 1, 3), e 2 (ω) = y l for ω = ω i (i = 2, 4) with y h y l, and so [e 1 (ω)] = Π 1 (ω), [e 2 (ω)] = Π 2 (ω). This means that agent 1 s effort at ω 1, ω 2 is higher that the effort at ω 3, ω 4, and agent 2 s effort at ω 1, ω 3 is higher that the effort at ω 2, ω 4 ; I 1 (x) and I 2 (y) are profit functions and I P (x, y) = I 1 (x 1 ) + I 2 (x 2 ) is the total amount of the profits; Under the situation we obtain that E[W 1 Π 1 ](ω) = r 1 (x h )I P (x h, y j ) (ω = ω 1, ω 2, j = h, l), E[W 1 Π 1 ](ω) = r 1 (x l )I P (x l, y j ) (ω = ω 3, ω 4, j = h, l), E[W 2 Π 2 ](ω) = r 2 (y h )I P (x i, y h ) (ω = ω 1, ω 3, i = h, l), E[W 2 Π 2 ](ω) = r 2 (y l )I P (x i, y l ) (ω = ω 2, ω 4, i = h, l) E[I P Π P ](ω) = 1 (I 1 (x i ) + I 2 (y j )). 4 i,j=h,l h l e y h y t H ω 1 ω 2 x h ω 1 ω 2 L ω 3 ω 4 x l ω 3 ω 4 TABLE I. TYPES OF STATES AND VARIABLES Then we can observe that there are no moral hazard if r 1 (e) = r 2 (e) 1 2, hence any effort level can be a solution of the above contract problem PE and AE. IV. MAIN THEOREM For the beliefs q P, q k of the principal P and agent k, we refer the conditions: BCK k N [q k ] K N C ([q P ]) The interpretations of BCK is that all the agents commonly know the principal s belief q P at some state where all agents actually have their beliefs (q k ) k N. Under the circumstances we can now restate Theorem 1 as follows: Theorem 2. In the principal-agent model under uncertainty with KE, assume that the principal P and each agent k have actually beliefs q P, q k. If all agents commonly know the principal s belief q P then each solution e k (k N) of the the contract design problem PE, AE must be a critical point of r k for every k N; i.e.; if BCK is true then r k (e k) = 0. In this case the proportional rate r k is determined by the principal belief: r k (e k ) = q P (e k ). Before proceeding with the proof, we notice that Theorem 2 can explain the resolution of moral hazard of Example 1: In fact, since K {1,2} C ([q P (e(ω); ω)]) = Ω, we can see that [q P (e(ω); ω)] is common-knowledge everywhere among agents 1, 2, and further, r 1 and r 2 are the constant with r 1 (e 1 (ω)) = q P (e 1 (ω); ω) = 1 2, r 2(e 2 (ω)) = q P (e 2 (ω); ω) = 1 2. Hence the resolution of moral hazard in Example 1 can be described by Theorem 2. We shall turn to the proof of Theorem 2: Critical points condition. Partially differentiating the expressions in the parentheses of the problems PE and AE with respect to x k yields the neccesary condition for critical points for every k N: From PE we have, Exp[I k(e k ) Π P (ω)] = I k(e k )q P (e; ω) = Exp[c k(e k )]; and from AE the condition is also that, subject to 0 < r k < 1 and k N r k = 1, r k(e k )Exp[I k (e k ) Π k (ω)] + r k (e k )Exp[I k(e k ) Π k (ω)] = Exp[c k(e k )] (1) The below plays another central role to prove Theorem 2: Proposition 1 (Decomposition theorem). Suppose that the principal P and each agent k have beliefs q P, q k with BCK in the principal-agent model under uncertainty with KE. Then we obtain that for every k N, q P (e) = q k (e k )q P (e k ) for any e = (e 1, e 2,, e k,, e n ) R n +. Proof: Let M denote M = K N C ([q C]), which is nonempty by BCK, and so take ω M. For each k N and for each e = (e 1, e 2,, e k,, e n ) R n +, we set H k = [e k ] M, and we can easily observe that both M or H k satisfy the below properties: 184 IT4OD 2014

191 (1) ω L [q P (e; ω)] [q k (e k ; ω)] for L = M and H k ; (2) M is decomposed into the disjoint unions of components Π P (ω) for ω M; (3) L is done into the disjoint unions of components Π k (ω) for ω L = M and H k. Therefore, on considering X = [e k ] it follows by (3) for L = H k that µ([e k ] H k ) = µ([e k ] Π k (ω)) and µ([e k ] H k ) = q k (e k ; ω). Dividing by µ(m) yields that µ([e] M) = q k (e k ; ω)µ([e k ] M) (2) On considering X = [e] it follows by (2) for L = M that µ([e] M) = µ([e] Π P (ω)). Hence we can observe and thus µ([e] M) = q P (e; ω). (3) µ([e k ] M) = q P (e k ; ω). (4) It follows by Eqs. (2), (3) and (4) that q P (e; ω) = q k (e k ; ω)q P (e k ; ω). Noting ω [q C (e)] [q k (e k )], we have shown that q P (e) = q k (e k )q P (e k ). Proof of Theorem 2: Notations and assumptions are the same in Proposition 1. On viewing Eqs.(1) and (1), it can be easily observed that the former part of Theorem 2 follows from Proposition 1. Especially, if r k is a constant function, then r k = 0, and so the latter part also follows immediately. In viewing the below example, we can see that Theorem 2 cannot hold without BCK. Example 2. Let us consider the principal-agent model under uncertainty M given by the same as in Example 1 only replacing Π P with Π P (ω) = {ω} for ω Ω. It can be plainly observed that I 1 (x h ) + I 2 (y h ) for ω = ω 1 ; I E[I P Π P ](ω) = 1 (x h ) + I 2 (y l ) for ω = ω 2 ; I 1 (x l ) + I 2 (y h ) for ω = ω 3 ; I 1 (x l ) + I 2 (y l ) for ω = ω 4 ; [q P (e; ω)] = {ω}. On noting that E[W 1 Π 1 ](ω) and E[W 2 Π 2 ](ω) are given in Example 1, it can be seen that K {1,2} C ([q P (e; ω)]) = and r 1 = r 2 = 1, in contradiction. Therefore there is no solution of the contract design problem AE, PE when BCK does not hold. V. COMMUNICATION. We consider the situation that each agent will bargain independently with his/her conjecture about the others efforts to the principal: The agent 1 communicates his/her conjecture q 0 1(e 1, ; ω) on the others effort to the principal P as message, and the recipient P refines her/his information partition Π 1 P according to the message. She/he gets her/his revised conjecture q 1 P (e, ; ω), and sends it to theagent 2. The agent 2 refines his/her information partition Π 2 2 according to the message. He/she gets the revised conjecture q 2 2(e 2, ; ω), and sends it back to the principal P. The recipient P refines her/his information partition Π 3 P according to the message, and se/he gets the revised conjecture q 3 P (e, ; ω) and so on. The principal refines his/her information partition Π n P according to the message from agent n 1. She/he revises his conjecture as q n P (e, ; ω), and sends it to the agent n. The recipient n refines her/his information partition Π n+1 n according to the message, and he/she gets the revised conjecture q n+1 n (e, ; ω) and so on. This protocol is illustrated as Pr : 1 P 2 P 3 P 4 P n 1 P n P 1 P 2 Under the circumstances we shall consider the maximisation problems as for the optimal plans for the principal and agents: Find out effort levels e = (e 1, e 2,, e k,, e n ) R n + satisfying PE Max e=(ek ) k=1,2,,n {Exp[I P (e) Π P (ω)] n k=1 Exp[c k(e k )]}; AE Max ek {Exp[W k (e k ) Π k (ω)] Exp[c k(e k )]} subject to n k=1 r k = 1, 0 < r k 1. Revision process through communication We assume that the principal and agents communicate by sending messages. Let T be the time horizontal line {0, 1, 2, t, }. A protocol is a mapping Pr : T N N, t (s(t), r(t)) such that s(t) r(t). Here t stands for time and s(t) and r(t) are, respectively, the sender and the recipient of the communication which takes place at time t. We consider the protocol as the directed graph whose vertices are the set of all members in N and such that there is an edge (or an arc) from i to j if and only if there are infinitely many t such that s(t) = i and r(t) = j. A protocol Pr is said to be fair if the graph is stronglyconnected; in words, every player in this protocol communicates directly or indirectly with every other player infinitely often. It is said to contain a cycle if there are players i 1, i 2,..., i k with k 3 such that for all m < k, i m communicates directly with i m+1, and such that i k communicates directly with i 1. The communications is assumed to proceed in rounds By a revision process function we mean a correspondence R assigning to each pair of partitions (P 1, P 2 ) of Ω a partition R(P 1, P 2 ) of Ω. Definition 4. A communication process π(m) with revisions of the principal and agents conjectures (q t i ) (i,t) N T according to a revision process function R is a tuple π(m) = M, Pr, R, (Π t i), (q t i) (i,t) N T with the following structures: 1) M = N, (I k ) k N, (e k ) k N, (r k ) k N, Ω, µ, (Π k ) k N, (c k ) k N is a principal-agent model under uncertainty with a common prior µ on a statespace Ω, 2) Pr is a protocol among N, Pr(t) = (s(t), r(t)), is fair and it satisfies the conditions that r(t) = s(t+1) for every t and that the communications proceed in rounds; 3) R is a revision process function; 4) Π t i is the revised information structure at time t defined as the mapping of Ω into 2 Ω for i S. 185 IT4OD 2014

192 If i = s(t) is a sender at t, the message sent by i to j = r(t) is Mi t. An n-tuple (qt i ) i N is a revision process of individual conjectures for the others effort. These structures are inductively defined as follows: Set Π 0 i (ω) = Π i(ω). Assume that Π t i is defined. It yields the distribution q t i (e, ω) = µ([e = e] Πt i (ω)). Whence the message Mi t : Ω 2Ω sent by the sender i at time t is defined by Mi t(ω) = e i E i {ξ Ω q t i (e i, ξ) = q t i (e i, ω) }. Then: The revised partition Π t+1 i at time t + 1 is defined by Π t+1 i = R(Π t i, M s(t) t ); and it yields the revised conjecture: q t+1 i (e; ω) := µ([e] Π t+1 i (ω)), i sends the information on the revised conjecture M t+1 i (ω) to the recipient r(t + 1) according the protocol Pr. The specification is that a sender s(t) at time t informs the recipient r(t) his/her prediction about the others efforts. The recipient revises her/his information structure under the information received through message. She/he predicts the others effort, and she/he informs her/his the predictions to the other r(t + 1). Convergence: Because Ω is finite there exists a sufficient large τ T a sufficient large τ T such that for all ω Ω, q τ i ( ; ω) = qτ+1 i ( ; ω) = q τ+2 i ( ; ω) =, and we denote τ T by. Hence we can write q τ i by q i. VI. AUTHORITY We shall focus on the two types of communications: One is the case that the principal has a formal authority, and the other is that she has a real authority. In the case of P-formal authority, the principal has the formal authority called the superior, and she may overrule the agent. But she never do indeed if she is informed and if the agent s recommendation is not congruent. She always accept the agent s recommendation and rubber stomps this proposals, and the agent cannot be overrule by the principal. Then we actually say the agents have real authority. This covers a constitutional monarchy, in which the principal is the queen/king and the agent is the cabinet. In the case of P-real authority, the principal has a real authority. When she will is informed and if the agent s recommendation is not congruent then she can refer the recommendation back to the agent for congruent with her prefer or her making decision. This covers an absolute monarchy. Definition 5. By P-formal authority communication we mean a communication process π(m) in Definition 4 satisfying: 1) the communication starts by sending message from an agent not from the principal; i.e., s(0) P ; 2) the revision process function is given by { Pi P R(P i, P j ) = j if(i, j) = (s(t), P) for t T ; P i P j if not where P i P j is meant the finest common coarsening of two partitions P i and P j, and P i P j is the refinement of the two partitions P i and P j. Definition 6. By P-real authority communication we mean a communication process π(m) in Definition 4 given by the revision process function R(P i, P j ) = P i P j for (i, j) = (s(t), r(t)) for t T. Example 3. Let us consider the communication process π(m) according the protocol Pr as below: 1) The principal-agent model under uncertainty M is the same as in Example 2; 2) Pr : 1 P 2 P 1 P 2 P Then we can observe: P-formal authority communication: If the principal has the formal authority, the information structure is given as Π P = Ω, Π 1 = Π 1, Π 2 = Π 2, and in this case it can be shown that there is no moral hazard. P-real authority communication: On the other hand, if the principal has the real authority, the information structure is given as Π P (ω) = Π 1 (ω) = Π 2 (ω) = {ω}, and it is easily observed that that the moral hazard is still remained. Remark 1. When the communication would be started by the principal like as the below Pr, there also remain a moral hazard: Pr : P 1 P 2 P 1 P 2 P VII. RESOLUTION BY COMMUNICATION This section investigates the moral hazard problem in the communication model. Let notations and assumptions be the same in Section V. For a solution of the contract design problem PE, AE, we restate the rest of Theorem 1: Theorem 3. In the principal-agent model under uncertainty with KE, assume that the principal P and each agent k communicate according to the communication process π(m), and so they have conjectures qp, q k. Then for the the contract design problem PE, AE we obtain that (i) (ii) In the case of P-formal authority communication, every solution e k (k N) of the the contract design problem PE, AE must be a critical point of r k for every k N; i.e.: r k (e k) = 0 after long run communication. Furthermore, the proportional rate r k is determined by the principal belief: r k (e k ) = qp (e k). In the case of P-real authority communication, there is not solutions of the the contract design problem PE, AE in general, and thus a moral hazard still remains after long run communication. Proof: (i) By the definition of P-formal authority it can be plainly observed that Π 1 (ω) Π 2 (ω) [q P (ω)], and it is verified that k N [qk ] KN C ([q P ]). The part (i) follows immediately from Theorem 2. (ii) Example 3 gives a counter-example. 186 IT4OD 2014

193 Remark 2. The above assertion (i) is true when the protocol is acyclic 1 in the P-formal authority communication. A. Conclusion VIII. CONCLUDING REMARKS This paper advocates a new approach to treat a moral hazard problem in principal-agent model focused on the beliefs of effort levels from the epistemic point of view. To highlight the structure of sharing private information about their conjectures on effort levels by common-knowledge between principal and agents is capable of helping us to make progress in problematic of classical principal-agent models. In particular, common-knowledge on the conjectures play crucial role in removing out the moral hazard in the classical principal-agent model. The moral hazard cannot be always disappeared at the maximal effort level of the refund function if the commonknowledge on the expectation of the others effort level holds or if a adequate communication of the conjectures on the effort levels among the principal and the agents ( e.g. P-formal authority communication) can be achieved. B. Discussion We remark on our result and the assumptions of the principal-agent model under uncertainty. Common-Knowledge : In viewing that the notion of common-knowledge defined as infinite regress of mutual knowledge, the assumption seems to be very strong, and so we would like to remove out it in our investigation. However, in this paper the common-knowledge assumption plays a central role for solving our contract design problem. In fact, even in the P-formal authority communication given in the resolution of moral hazard technically depends on common-knowledge case (Theorem 3.) Further this assumption is crucial in viewing Example 3. On these account we have to take a new approach to tackle our contract design problem for essentially removing out the common-knowledge in it. Information Partition: We require each of the principal and agents has the information structure given by partition of the state space. The structure is a class of information function satisfying Ref, Trn and Sym or equivalently a class of knowledge operators having the postulates K, 4 and 5. Among others the postulate Sym ( equivalently 5 ) is considered as a problematic assumption ( Bacharach [4].) Especially, 5 is called Axiom of Wisdom, which is interpreted as that i knows what he/she does not know. Can we remove out some of the postulates in our framework? In fact, Theorems 2 and 3 can be verified without 5, but not the others. The detail will be reported elsewhere in near future. Communication: We have shown Theorem 3 only for the case of the particular protocol of P-authority communication. It will be plainly shown that the theorem is still be true for any acyclic protocol, and further it will be valid for any cyclic protocol too. For communication in a principal-agent model uncertainty it is not clear the relationship between a resolution of moral hazard and the topology of the communication graph, and it remains to be discussed further. The notion of formal 1 i.e.; it does not contain any cycles. and real authority communication presented in this paper is inspired by Aghion and Tirole [1]. I hope the communication model would be a useful tool in Political Economy. Infinite state-space: This paper treats only the case of a finite state-space. How can we extend Theorems 2 and 3 in an infinite state-space? We can easily see that Theorem 2 is valid in this case without any changes, but Theorem 3 is not. To make Theorem 3 be valid we have to impose additional assumptions for convergence of the communication process. This issue also admits of further discussion. Applications: We have not also analyzed any other applications of the above presented model than the moral hazard. Can we treat the Global Financial Crisis by our framework? How about the extended agreement issues on TPP (Trans- Pacific Strategic Economic Partnership)? These are of course an interesting and important challenge. Actually much more still remains to be investigated further. C. Appraisal Finally it well ends in appraisal. Our results recommend that for removing out the moral hazard in the buyer-supplier cooperation management, they (the buyer and suppliers) should share fully information on only their conjectures on the others effort levels but not expected marginal costs by making common-knowledge on the efforts or by communicating their beliefs on the efforts. This claim is not so fresh to us: In cooperative organization system, any concerned agents will, tacitly or consciously, share their information. We well know this is the first step towards to succeed our cooperation. The point in this paper is to make it clear what kind of information we have to share: It is the information on the belief of on the others effort levels but not our expected marginal costs. REFERENCES [1] Aghion, P. and Tirole, J Formal and Real Authority in Organization, Journal of Political Economy Vol. 105(1) pp [2] Arrow, K.J Uncertanty and welfare economics of medical care, American Economic Review Vol. 53, pp [3] Aumann, R. J Agreeing to disagree, Annals of Statistics Vol. 4, pp [4] Bacharach, M Some extensions of a claim of Aumann in an axiomatic model of knowledge, Journal of Economic Theory Vol. 37 pp [5] Binmore, K.: Fun and Games. D. C. Heath and Company, Lexington, Massachusetts (1992). [6] Holmstrom, B Moral Hazard in Teams, Bell Journal of Economics Vol.13, pp [7] Krasucki, P., (1993) Protocol forcing consensus, Journal of Economic Theory Vol. 70, pp [8] Matsuhisa, T Moral hazard resolved by common-knowledge in principal-agent model. International Journal of Intelligent Information and Database Systems, Vol. 6, No. 3, pp [9] Matsuhisa, T. and Jiang, D.-Y Moral hazard resolved in communication Network. World Journal of Social Sciences Vol. 1 (3), pp [10] Parikh, R. and Krasucki, P Communication, consensus, and knowledge, Journal of Economic Theory Vol. 52, pp [11] Radner, R Repeated partnership games with imperfect monitoring and discounting, Review of Economic Studies Vol. 53, pp [12] Williams, S. and Radner, R Efficiency in partnerships when the joint output is uncertain, Discussion Paper No.76, Kellog School of Management, Northwestern University ( 187 IT4OD 2014

194 Topic 6: Artificial Intelligence and Multimedia 188 IT4OD 2014

195 Application des Curvelets et Régression SVM pour la Compression d Images 1 Zikiou Nadia Laboratoire d Analyse et de Modélisation des Phénomènes Aléatoires (LAMPA), Département d Electronique, Université Mouloud Mammeri (UMMTO) Tizi Ouzou, Algérie 2 Lahdir Mourad Laboratoire d Analyse et de Modélisation des Phénomènes Aléatoires (LAMPA), Département d Electronique, Université Mouloud Mammeri (UMMTO) Tizi-Ouzou, Algérie 3 Ameur Soltane Laboratoire d Analyse et de Modélisation des Phénomènes Aléatoires (LAMPA), Département d Electronique, Université Mouloud Mammeri (UMMTO) Tizi-Ouzou, Algérie Résumé Dans cet article, nous proposons un nouveau algorithme de compression d images en utilisant la transformée en Curvelets Rapides Seconde Génération et la Régression SVM (Machine à Vecteur de Support). SVM est une méthode d apprentissage développée sur la base de la Théorie de l Apprentissage Statistique. Nous avons choisi d utiliser le modèle epsilon-svr pour la régression SVM. La transformée en Curvelets permet de capturer les structures spatio-temporelles à une échelle et orientation données. Notre algorithme consiste à appliquer un apprentissage SVR sur des coefficients issues de la transformée en Curvelets de l image à compresser. Les poids des vecteurs de support (SVs) sont ensuite quantifiés et codés en utilisant les codages par plages de zéros (Run Length Encoding, RLE) et Arithmétique. L application de notre algorithme de compression sur un ensemble d images tests a permis d atteindre un rapport signal sur bruit crête (Peak Signal to Noise Ratio, PSNR) de 28,80 db pour un rapport de compression (RC) de 25,22 sur l image "Lena". La comparaison des résultats de notre algorithme, basé sur les Curvelets, avec ceux obtenus par les méthodes de compression utilisant la transformée en ondelettes, montre un gain d environ 2 db en faveur de la transformée en Curvelets. Les résultats obtenus montrent que notre algorithme permet d atteindre des rapports de compression élevés avec une très bonne qualité des images. Mots clés-compression d images. Transformée en Curvelets Rapides Seconde Génération. Transformée en Ondelettes. Régression à Vecteurs de Support (SVR). Codages RLE et Arithmétique. I. INTRODUCTION La compression des images consiste à r éduire la taille physique des blocs d informations. L'objectif principal est de réduire à la fois les redondances spatiale et spectrale pour permettre de stocker ou transmettre les données de manière appropriée. Plusieurs techniques sont utilisées afin d atteindre cet objectif, dans cet article nous nous concentrons sur la compression d'image avec perte utilisant deux transformées : Curvelets et Ondelettes [1]. Depuis leur introduction au début des années 1980, les ondelettes ont fait l objet de beaucoup d attention dans des domaines aussi diversifiés que le débruitage, la compression, le codage, l imagerie médicale ou satellitaire. Cependant, il apparaît aujourd hui clairement que les ondelettes ne sont pas optimales pour l analyse d objets anisotropes dans l image (les lignes, les contours...). Depuis quelques années, de nouvelles transformées multiéchelles ont été développées comme les curvelets, contourlets et bandlets qui intègrent la notion de directionnalité et qui permettent de rechercher des objets de manière optimale. Les curvelets ont été proposées par E. Candès et D. Donoho [2], constituent une nouvelle famille de frames d ondelettes géométriques plus efficaces que les transformées traditionnelles, et qui sont conçues pour représenter de façon parcimonieuse les contours. Comparées aux ondelettes, la transformée en curvelets peut représenter un contour lisse avec moins de coefficients pour la même précision. La transformée de curvelets est une transformée multi-échelles, multidirectionnelles avec des atomes indexés par un paramètre de position, d échelle et de direction [2, 3]. SVM (Machine à Vecteur de Support) est une méthode d apprentissage développée sur la base de la Théorie de l Apprentissage Statistique (Statistical Learning Theory) de Vapnik et Chervonenkis [4]. 189 IT4OD 2014

196 Elle permet de réaliser des estimations en classification (à deux classes ou plus) [5] ou en régression [6]. Dans cet article nous présentons un algorithme de compression d images qui consiste à appliquer un apprentissage SVR sur des coefficients issues de la transformée en curvelets seconde génération de l image à co mpresser. Les codages par RLE et arithmétique seront utilisés pour le codage des vecteurs des poids des SVs [7]. Certaines de ces sous-bandes ne contiennent pas une grande quantité d énergie, donc elles ont peu d effet sur la qualité de l image, d où elles sont directement écartées. Cet article est structuré en plusieurs sections. Les deuxième et troisième sections présentent les principes de la transformée en curvelets et la régression SVM. La méthode de compression d images choisie est présentée dans la quatrième section. Les résultats obtenus par l application de la méthode sur plusieurs images tests seront calculés et discutés dans la cinquième section. II. LES CURVELETS RAPIDES SECONDE GENERATION Les Curvelets-99 se sont montrées très performantes en débruitage [8, 9] comme dans d autres applications [10, 11], mais souffrent d un certain nombre de défauts : leur construction par étapes successives est complexe, et elles possèdent un grand nombre de paramètres (deux d échelle, trois de position, et un d angle). De plus, elles sont très redondantes. Ces revers ont conduit à une construction d un autre type de curvelets, ayant seulement trois paramètres, et avec une redondance plus faible. Cette nouvelle transformée, introduite par Candès et al. [2], s implémente comme un pavage de l espace de Fourier, considéré comme [ 1,1] 2. A. La transformée continue On définit la transformée continue à l aide des fenêtres V et W, respectivement angulaire et radiale, qui agiront dans le domaine de Fourier : WW(rr), rr [1/2,2] eeee VV(tt), tt [ 1,1]. Pour toute échelle j >j0, on définit une fenêtre fréquentielle Uj par UU jj (rr, θθ) = 2 3jj 4 WW(2 jj rr)vv(2 jj 2 θθ/2ππ) (1) On peut ainsi définir la curvelet à l échelle j dans le domaine de Fourier, ψψ jj (ww) = UU jj (ww) (2) Les autres pouvant en être déduites par rotations et translations. La curvelet à l échelle 2 jj, d orientation θθ ll = 2ππ2 jj /2 ll, ll =0 2 jj /2-1 et (jj de position xx,ll) kk = RR 1 θθtt (kk 1 2 jj, kk 2 2 jj /2 ), est donc défini par : ψψ jj,ll,kk (xx) = ψψ jj RR θθtt (xx xx kk (jj,ll) ) (3) La décomposition en curvelets est obtenue par simple produit scalaire sur cette famille de fonctions cc(jj, ll, kk) = ff, ψψ jj,ll,kk qui peut être défini dans le domaine fréquentiel par le théorème de Parseval : cc(jj, ll, kk) = 1 (2ππ) 2 [ 1,1] 2 ff (ww)uu jj RR θθtt ww ee ii xx (jj,ll) kk,ww dddd (4) Figure 1. P avage des curvelets continues : (a) en domaine fréquentiel (b) en domaine spatial [12]. B. La transformée discrète Pour le passage en discret, il faut changer quelque peu la forme des fenêtres sur lesquelles vivent les curvelets pour les adapter à une grille cartésienne. Ainsi, le pavage du plan fréquentiel en anneaux circulaires concentriques (filtres passebande) se transforme en couronnes cartésiennes concentriques lorsque l on applique les filtres sur les deux directions ww 1 et ww 2 indépendamment. Soit une fonction à support dans [-2, 2] et valant 1 sur [-1/2, 1/2]. On pose jj (ww) = (2 jj ww 1 ) (2 jj ww 1 ). On définit la famille de filtres passe-bande : WW DD jj (ww) 2 = jj +1 (ww) 2 jj (ww), jj 0 (5) Les coefficients de la transformée en curvelets discrète sont obtenus par : cc(jj, ll, kk) = ff (ww)uu jj SS 1 θθll ww ee ll SS θθ TT bb,aaaa ll dddd (6) Figure 2. Partitionnements spatial et fréquentiel engendrés par les curvelets discrètes [12]. 190 IT4OD 2014

197 III. REGRESSION SVM POUR LA COMPRESSION D IMAGES Dans leur origine, les SVMs ont été développées pour des problèmes de classification. Cependant, leur nature leur permet de résoudre également des problèmes de régression. La régression est un cas particulier de classification où les classes des exemples ne sont pas dénombrables c-à-d continues. Le problème consiste à trouver, en utilisant D={(xx 1, yy 1 ),.., (xx nn, yy nn )}, une fonction ff : RR mm RR qui rapproche le plus possible des yi, en d autre terme qui minimise la différence entre les ff (xx ii ) et les yy ii. Souvent, ff est considérée comme fonction linéaire : ff = ww, xx + b, où w est un vecteur et b est un scalaire. Le problème revient donc à trouver un hyperplan caractérisé par ww et bb qui minimise l écart global entre ff et les yy ii (équation (7)). (ww, bb )= aaaaaaaaaaaa ww,bb nn ii=1 (yy ii ww, xx ii bb) 2 (7) Pour résoudre ce problème, les SVMs utilisent une astuce qui consiste à modéliser la fonction de régression par un hyperplan qui se situe au centre d un hyper-tube de largeur 2εε contenant tous les exemples d entrainement (figure 3.). 0 αα ii, αα ii CC (8) Où les αα ii et αα ii sont les coefficients des exemples respectivement au dessus et au dessous de l hyperplan et C est un paramètre pour leur pénalisation. La fonction de sortie ff (xx)peut être donnée par l équation (9). nn ff (xx) = ii=1 (αα ii αα ii ) xx ii, xx + bb (9) Où b est calculé à partir d un exemple dont 0 < αα ii < C (vecteur support) par l équation (10). Utilisation des noyaux b= yy ii - ww, xx ii εε (10) Parmi les motivations les plus solides du développement des machines à v ecteur support pour la régression, est leur extension simple aux cas non linéaires, grâce à l utilisation des noyaux. En effet, d une manière similaire au cas de classification, on fait une transformation d espace pour se trouver toujours face à u ne régression linéaire. La transformation d espace inverse 1 permet de retourner à l espace d origine après la résolution dans le nouvel espace (Figure 4.). Figure 3. Hyper-tube modélisant la fonction de régression [13] Plusieurs hyper-tubes, de largeur 2εε contenant tous les exemples d entrainement, peuvent exister. L hyper-tube optimal est celui qui minimise la distance entre les exemples d entrainement et ses frontières, autrement dit, qui maximise la distance des exemples de l hyperplan du centre (Figure 3.). La détermination de l hyper-tube optimal est semblable à la détermination de l hyperplan optimal de marge maximale dans le cas de classification. On doit donc rechercher un hypertube de marge maximale avec tous les exemples d entrainement à l intérieur. Par une analyse similaire à ce lle du problème de classification, la solution du problème de régression est réduite à la résolution du problème dual d optimisation quadratique de l équation (8). nn MMMMMMMMMMMMMMMMMM aa,aa 1 2 (αα ii αα ii ) αα jj αα jj xx ii, xx jj Sous contraintes ii,jj =1 nn εε (αα ii αα ii ) ii=1 nn nn + yy ii (αα ii αα ii ) ii=1 ii=1 αα ii αα ii = 0 Figure 4. Utilisation des noyaux pour la résolution de la régression non linéaire [13]. La transformation et son inverse sont réalisées grâce à u ne fonction réelle K(xx ii, xx jj ) appelée Noyau (Kernel). Le produit scalaire dans les équations (8) et (9) est remplacé par la fonction du noyau. Ces fonctions noyaux doivent satisfaire les conditions de Mercer, voici quelques exemples de noyaux : 1) Linéaire k(x, x ) = x. x (11) 2) Polynomial k(x, x ) = (xx. xx ) dd oooo (cc + xx. xx ) dd (12) 3) Gaussien k(x, x ) = ee xx xx 2 σσ (13) 4) Sigmoïde k(x, x ) = tanh(αα 0 xx. xx + ββ 0 (14) IV. METHODE PROPOSEE La méthode proposée est basée sur l utilisation des SVR pour la compression des coefficients des curvelets seconde génération. Nous avons opté pour le modèle epsilon-svr et fixé le nombre de décomposition nn ssssssssss à 6. Le schéma suivant résume la procédure de l algorithme proposé : 191 IT4OD 2014

198 V.4. Normalisation des coefficients de Curvelets Nous avons utilisé l équation (15) pour la normalisation des niveaux Détails. Figure 5. Schéma de l algorithme de compression proposé La procédure de l algorithme de codage est donnée comme suit : V.1. Application de la transformée en Curvelets L image originale est décomposée en utilisant la transformée en curvelets rapide seconde génération en trois niveaux : Grossier (Coarse), Détail (Detail) et Fin (Fine). Le niveau Grossier contient les sous-bandes des basses fréquences, ce niveau sera donc codé en utilisant la modulation par impulsion et codage différentielle (Differential Pulse Code Modulation, DPCM) afin de préserver l information pertinente de l image. Quant au niveau Fin, il c ontient les hautes fréquences qui seront écartées lors du codage. Les fréquences intermédiaires sont attribuées aux Détails. Ces derniers vont subir un apprentissage SVR sur des coefficients issues de la transformée en Curvelets de l image à compresser. Les poids des vecteurs de support (SVs) sont ensuite quantifiés et codés en utilisant les codages RLE et Arithmétique. V.2. Quantification des coefficients par Zone morte ou Deadzone Il existe de très nombreuses valeurs autour de zéro, non-significatives, suite à la transformation de l image par la transformée en curvelets qui pénaliseraient la suite du processus de codage. Typiquement, les valeurs comprises dans la zone morte sont quantifiées à zéro, et ne sont donc pas considérées par le codage entropique. En utilisant la quantification à zo ne morte, on quantifie les coefficients issus de la décomposition en curvelets. A partir des expériences, la taille optimale de la Zone morte est prise entre 0.6 et V.3. Codage des sous-bandes de basses fréquences Les sous-bandes de basses fréquences sont codées en utilisant le codage DPCM (Differential Pulse-Code Modulation). Le codage DPCM permet de compresser de manière intelligente un plan comportant des valeurs homogènes. On considère ici que dans le niveau Grossier (l information homogène de l image), la valeur d un pixel est fortement corrélée à celles de ses voisins passés au travers d un prédicteur. Le prédicteur est une simple matrice de coefficients permettant de pondérer les pixels voisins du pixel à coder. L erreur obtenue entre le plan d origine et le plan prédit est alors transmise, et le gain de cette solution repose sur le fait que l on peut de manière générale coder cette erreur sur un nombre bien inférieur de bits par pixel. cc = cc cc mmmmmm (15) cc mmmmmm cc mmmmmm cc mmmmmm et cc mmmmmm sont les valeurs maximale et minimale des coefficients de curvelets respectivement, c est le coefficient à normaliser et c est la valeur après normalisation [7]. V.5. Apprentissage des coefficients de Curvelets par les SVM Pour l apprentissage par SVM, il existe deux paramètres qui affectent l efficacité de la compression : le type et les paramètres du noyau. En général les types de noyaux sollicités pour les SVM sont les noyaux : linéaire, polynomial et gaussien. La distribution des coefficients dans un bloc est approximativement considérée en tant que distribution gaussienne [14]. Ainsi nous avons choisis la fonction gaussienne comme type de noyau pour la régression SVM, et nous l avons appliqué aux coefficients des niveaux Détails. La figure 6 montre un exemple des résultats de simulation SVR sur un bloc de 4*4. Figure 6. Les résultats de simulation sur un bloc de 4*4 de l image Lena avec la Régression SVM à noyau gaussien et les Curvelets. Il existe deux versions de régression SVM couramment utilisées : epsilon-svr et nu-svr. Epsilon ou nu sont seulement des versions différentes du paramètre de pénalité. Le même problème d'optimisation est résolu dans les deux cas [15]. Nous avons choisi d utiliser epsilon- SVR car c était la formulation originale et c est la forme la plus couramment utilisée. V.6. Codage des coefficients On combine les vecteurs de support (SVs) avec leurs valeurs (poids). On applique ensuite les codages RLE et Arithmétique pour ce vecteur. Les sous bandes des hautes fréquences seront écartées pour l étape de la compression, mais elles seront rajoutées lors de la reconstruction. V. RESULTATS ET DISCUSSION L algorithme est implémenté avec MATLAB (R2012a) et nous avons utilisé LIBSVM pour la régression SVM [16]. Pour la transformée en curvelets seconde génération nous avons utilisé celle proposée dans le Toolbox CurveLab [17]. La 192 IT4OD 2014

199 comparaison des résultats de compression de l algorithme proposé avec ceux des méthodes basées sur les ondelettes est donnée dans le tableau 1. TABLEAU 1. LES RESULTATS DES PSNRS DE NOTRE ALGORITHME COMPARES A CEUX BASES SUR LES ONDELETTES Image Bateau 256*256 Femme 256*256 TIGRE 256*256 Aerien 256*256 Lena 512*512 CITY 256*256 RC PSNR Ondelettes Curvelets 24,66 27,06 28,10 24,19 26,7 27,35 23,54 27,78 28,90 22,56 30,08 31,59 25,22 27,04 28,80 22,3 30,5 31,06 Les résultats montrent qu on peut atteindre des taux de compression élevés avec des bons PSNR. Effectivement, nous avons atteint avec notre algorithme un PSNR allant jusqu à 28,80 pour l image "Lena" de taille 512 x 512 avec un noyau SVR Gaussien. On peut constater une amélioration typique du PSNR de l ordre de 1dB pour les images : Femme et CITY, et de plus de 1dB pour les images : Bateau, TIGRE, Aerien et Lena avec ce même noyau. Ces figures montrent qu il y a un gain important en terme de qualité visuelle dû au meilleur respect de la géométrie. Les résultats montrent clairement les bénéfices apportés par notre codeur en Curvelets pour la compression. La figure 7 donn e un aperçu sur l ensemble d images tests utilisés et les figures 8, 10, 12, 14, 16 et 18 montrent des zooms sur les images originales. Sur les figures 9, 11, 13, 15, 17 et 19, on peut faire une comparaison visuelle entre les images des deux algorithmes. On peut observer une meilleure qualité pour les images utilisant l algorithme en Curvelets. L algorithme en Curvelets rapporte des arêtes très lissées et respecte mieux les contours. Figure 8. Image originale Bateau (zoom). a) b) Figure 9. Images Bateau reconstruites. (a) Avec la méthode des Ondelettes, (b) Avec la méthode des Curvelets. Figure 10. Image originale Femme (zoom). a) b) Figure 11. Images Femme reconstruites. (a) Avec la méthode des Ondelettes, (b) Avec la méthode des Curvelets. Figure 12. Image originale Tigre (zoom) Figure 7. Images tests. (a) Bateau, (b) Femme, (c) Tigre(d) Aerien, (e) City, (f) Lena. a) b) Figure 13. Images Tigre reconstruites. (a) Avec la méthode des Ondelettes, (b) Avec la méthode des Curvelets. 193 IT4OD 2014

200 Figure 14. Image originale Aerien (zoom) a) b) Figure 15. Images Aerien reconstruites. (a) Avec la méthode des Ondelettes, (b) Avec la méthode des Curvelets. Figure 16. Image originale City (zoom) a) b) Figure 17. Images City reconstruites. (a) Avec la méthode des Ondelettes, (b) Avec la méthode des Curvelets. Figure 18. Image originale Lena (zoom) a) b) Figure 19. Images Lena reconstruites. (a) Avec la méthode des Ondelettes, (b) Avec la méthode des Curvelets. VI. CONCLUSION Dans cet article, nous avons proposé un nouveau algorithme de compression d images en utilisant la Régression SVM et la transformée en Curvelets Rapide Seconde Génération. Les résultats montrent un gain en PSNR, par rapport aux autres méthodes basées sur les ondelettes, pour un même RC avec une qualité d images nettement meilleure. En perspectives, on peut faire une meilleure sélection des paramètres pour les noyaux des SVM afin de diminuer le nombre de vecteurs de support et améliorer le RC. Vu les premiers résultats qui montrent le potentiel de cette approche, on peut l orienter vers plusieurs applications tels que : le biomédical, la biométrie, l hyperspectrale et la détection des objets. Bibliographies [1] D. Nilima Maske, V. Wani Patil, "Comparaison of Image Compression using Wavelet for Curvelet transform and Transmission over Wireless Channel", in International Journal of Scientific and Research Publications, Vol 2, Issue 5, May [2] E. Candès et D. Donoho, "Curvelets: A Surprisingly Effective Nonadaptive Representation of Objects with Edges", in Curves and Surface, Vanderbilt University Press, Nashville, TN, [3] E. Candès et D. Donoho, "New Tight Frames of Curvelets and Optimal representations of Objects with C 2 Singularities", in Department of Statistics, Stanford University, USA, November [4] V. Vapnik, "The Nature of Statistical Learning Theory", Springer-Verlag, New York, [5] C.J.C Burges, "A tutorial on Support Vector Machines for patterns recognition", in Data Mining and Knowledge Discovery, [6] A.J. Smola and B. Schölkopf, "A tutorial on Support Vector Regression", in NeuroCOLT2 Technical Report Series, NC2-TR , October [7] S. Fazli, S. Toofan, Z. Mehrara, "JPEG2000 Image Compression Using SVM and DWT", in International Journal of Science and Engineering Investigations, Vol 1, Issue 3, April [8] J.-L. Starck, E.J. Candès, and D.L. Donoho, "The curvelet transform for image denoising", in IEEE Transactions on Image Processing, June [9] J.L. Starck, D.L. Donoho, and E.J. Candès, "Very high quality image restoration by combining wavelets and curvelets", in A. Laine, M.A. Unser, and A. Aldroubi, editors, SPIE conference on Signal and Image Processing : Wavelet Applications in Signal and Image Processing IX, San Diego, 1-4 August. SPIE, [10] J.L. Starck, M.K. Nguyen, and F. Murtagh, "Wavelets and curvelets for image deconvolution : a combined approach", in Signal Processing, [11] M. Elad, J.-L. Starck, P. Querre, and D.L. Donoho, "Simultaneous cartoon and texture image inpainting using morphological component analysis". Applied and Computational Harmonic Analysis, [12] E.J. Candes, L.Demanet, D.L.Donoho, L.Ying, "Fast Discrete Curvelet Transforms", Stanford University Press Notes, July [13] A. Djeffal. "Utilisation des méthodes Support Vector Machine (SVM) dans l analyse des bases de données", in department of informatic, M ohamed Khider University, Biskra, [14] Wang XL, Han H, and Peng SL, "Image Restoration Based on Wavelet-domain Local Guassian Model", Jorunal of Software, Vol 15, N 3, [15] C.-C. Chang, C.-J. Lin, "Training support vector regression: Theory and algorithms", in Neural Computation, [16] [17] CurveLab Toolbox. http ://w 194 IT4OD 2014

201 No-reference Image Quality Assessement for JPEG2000 Compressed Images Using Natural Scene Statistics And Spatial Pooling Dakkar Borhen eddine Laboratoire d Automatique et de Robotique Faculté des Sciences de la Technologie Université Constantine1 Constantine, Algérie Hachouf Fella Laboratoire d Automatique et de Robotique Faculté des Sciences de la Technologie Université Constantine1 Constantine, Algérie Abstract A no reference image quality metric is proposed to assess the JPEG2000 compressed images, where the main artefacts introduced by the JPEG2000 compression are the blur and ringing. The new metric is based firstly on the calculation of the natural scene statistics NSS, then because of the change in the monotone intensity is very sensitive to the presence of blur we calculate the monotone changing (MC) over each block. Zeros crossing (ZC) are obtained via the Laplacian of Gaussian operator. A spatial weighed pooling approach is adopted to obtain the score of the perceived image, and to have a good correlation with the human judgements the above calculations are performed in two scales; the original scale and a reduced one. Tests have been conducted on the LIVE and IVC databases. Obtained results have shown that the proposed measure performs better than the full reference peak signal to noise ratio (PSNR) and it is comparable to the structural similarity index (SSIM). Keywords Image Quality Assessment(IQA),Natural Scene Statistics(NSS),Monotone changing(mc),zero Crossing(ZC),Spatial pooling I. INTRODUCTION In our live almost of us utilize new devices like phones, computers, tablets and many others. The most of them are equipped with high definition cameras and they are capable to capture, store, transmit and view the images or videos captured. From what we have said all of these functions have an effect on the perceptual information contained in the images, by causing some distortions that disturb the perceptual quality of the viewed scene. The image affected by one or more distortions will have a worse quality; because of this it s of big interest to evaluate the quality of an image. In literature the field of image quality assessment (IQA) has attracted many of researchers, where evaluating the quality of an image is a big task and it is very difficult issue. To evaluate the quality of an image, one can do this in tow different ways subjectively or objectively. The subjective way means that the human is the ultimate decider of quality, here several tests are conducted where a final score is finally obtained. This score indicates how the quality is. In literature different methods exist, however the major characteristic of the subjective tests are not practical; they are time consuming and they are very cost. In the other hand objective methods are algorithms which can automatically predict scores of quality, where the main goal is to be in agreement with human judgement. The objective methods can be classified to three different groups according to the presence of the reference or not. Full reference (FR) methods; where the reference (pristine) image is completely known and a comparison with the image under assessment is affected, for this group the problem is the availability of the reference which not warranted all time, for this full reference (FR) methods are not practical. Reduced reference (RR) methods; they utilize partial information of the pristine image, here the idea is extracting some features from the original image, sending them with the received image and finally exploit them to design a metric. The third group of methods is the No-reference or blind (NR) metrics. They are different from the previous methods, where nothing is provided except the image under assessment. They are more practical and till now there is no universal method able to predict quality. In literature the majority of no-reference metrics are distortion specification; where the distortion affects the image is previously known. In literature many of researchers have focused on the artefacts introduced by the JPEG2000 compression standard. Where the main artefacts introduced by this compression are blur and ringing, Fig.1 presents an image who is compressed with the JPEG2000, where Fig.2 presents a zoom of blurred region and the effect of ringing. Blur in images is due to the attenuation of the high spatial frequency coefficients. Where ringing artefacts is caused by the quantization or truncation of the high frequency coefficients resulting from a coding based on wavelet transform, it s called the Gibbs phenomenon. The distortions introduced by the JPEG2000 have been studied in many works in the literature. In [1] P. Marziliano presents two metrics one for blur (Full and No-reference) and another for ringing (Full reference). They have a low computational complexity, because they are based on the analysis of the edge and adjacent regions in an image. The main distortions introduced by the JPEG2000 are the blur and ringing due to the wavelet compression, the proposed blur measure is based on the calculation of the edge spread. For the Full reference the edges are detected from the original image, but for the No-reference, edges are obtained from the compressed image and at this point the measure will depend on the amount of compression. The ringing metric is based on the blur measure. 195 IT4OD 2014

202 ringing. Firstly natural scene statistics (NSS) of Rodeman [6] are extracted. Using them to calculate the monotone changing over the horizontal and vertical directions. Then zero-crossings are computed. To obtain the final score a spatial pooling approach[7] is used. The rest of this paper is organized as follows, section 2 describes the proposed measure, when section 3 presents the experimental results and comparison, in section 4 we conclude this work. II. PROPOSED METHOD A. Natural Scene Statistics in the Spatial Domain Fig. 1. image from the LIVE database present the distortions introduced by the JPEG2000 compression Using the distorted imagei(i, j), compute the normalized luminances via local mean subtraction and divisive normalization [6]: Î(i, j) = I(i, j) µ(i, j) σ(i, j) + C (1) Where i 1, 2...M; j 1, 2...N, are spatial indices. M, N are the image height and width respectively, C = 1 is a constant that prevents instabilities when the denominator tends to zero [8]. µ and σ are given by Fig. 2. Zoom of fig.1, left image presents the ringing effect, right image presents the blur effect The width of the ringing is computed on the left and on the right of the edges. In [2] the authors have proposed a measure based on natural scene statistics (NSS) to measure the quality of the JPEG2000 compressed images. They start from the idea that the JPEG2000 compression disturbs the non-linear dependencies contained in the natural scenes. Then deviation quantification from the NSS model is used to predict the quality of the image. In [3] H.Liu et al have proposed a method which extracts the ringing regions affected by the ringing artefact. Then the local visibility of ringing is estimated in these regions. In the next step they compare the quantified estimations to the local background. The score for each local ringing region is averaged to obtain a global score for the image. In reference [4] A neural network is used to assess the perceived quality. The neural network simulates the human visual system. The extracted features from the image and their corresponding subjective scores are used to learn the relationship between them. For the JPEG coded images, block effect is considered as the most relevant feature. While for the JPEG2000 compression, blur is the most relevant feature. In this paper we present a No-reference algorithm for assessing the quality of images compressed by JPEG2000 compression. It provides tow significant distortions blur and σ(i, j) = K µ(i, j) = L k= K l= L K L k= K l= L w k,l I k,l (i, j) (2) w k,l (I k,l (i, j) µ(i, j)) 2 (3) Where w = {w k,l k = K,..., K, l = L,..., L} is a 2D circularly-symmetric Gaussian weighting function. We utilize the normalized luminances because they are homogeneous for pristine images and the signs of adjacent coefficients exhibit a regular structure that can be affected in the presence of distortion [8]. B. Calculating of the monotone changing and zero-crossing First the normalized luminances matrix resulting from the calculation is decomposed into non-overlapping blocks. Then the horizontal difference between two adjacent pixels in the block is computed as follows[5]: D H (s, t) = Î(s, t) Î(s, t 1) (4) Where s = 1,..., S, t = 2,..., T, and S,T are the block height and width respectively. The sign of adjacent differences is calculated as : S H (s, t) = D H (s, t) D H (s, t + 1) (5) Where s = 1,..., S, t = 2,..., T 1, and S,T are the block height and width respectively. Then the monotone changing are given by: MC(s, t) = S H (s, t) > 0 (6) 196 IT4OD 2014

203 The horizontal MC are related to the monotone intensity change in its horizontal neighbourhood. Then we calculate the spread of a group of consecutive horizontal monotone changing MC starting at(i, j 1 ) and ending at(i, j 2 ) of the MC[5]: MC S (k) = j 2 j (7) k denotes a group of consecutive pixels. The constant 3 is made to include the end of pixels. The MC are computed because each horizontal MC is sensitive to the change in the monotone intensity. The MC of the group is defined as the product of the spread of the horizontal MC and the number of MC pixels (the number of MC pixels is the horizontal MC spread with subtracting the end of pixels), the MC of group is given by: MC H (k) = MC S (k) (MC S (k) 2) (8) And finally the MC of the block is defined as the sum of all the monotone changing. It is given by: MC H = 1 ST B MC H (k) (9) i=1 Where B is the number of blocks in the image. For the MC in the vertical direction, it is obtained using the same algorithm and transposing the image block. After calculating the MC, the maximum of the vertical and horizontal MC is taken as the overall MC of the block. MC B = max{mc H, MC V } (10) Now, the ZC are determined using the Laplacian of Gaussian (Log) function. The choice of the Log is motivated by; the image will be smoothed with the Gaussian filter to remove the noise contained in the image, and the Laplacian operator is used to detect the ZC in an image[9]. The resulting image is decomposed into non-overlapping blocks. For each block ZC are computed as: ZC B = S s=1 t=1 T ZC(s, t) (11) Where s = 1,..., S, t = 2,..., T, and S,T are the block height and width respectively. Moving now from these characteristics to a global score describing the quality of the image using spatial pooling approach. C. Final score via spatial pooling approach A spatial pooling approach is adopted, transforming the spatial characteristics to a global score describing the image quality. In [7], the authors have made a study of different spatial pooling to obtain scores in IQA, we will take the Local Quality/Distortion-Weighted Pooling, where the non-uniform quality distribution problem may also be solved more directly by assigning spatially varying importance (weights) over the image space. A general form of such a spatial weighting approach is given by [7]: N i=1 M = w im i N i=1 w (12) i Where w i is the weight assigned to the i th spatial location, and m i is the local quality measure. In this work the final score is obtained as follows: B i=1 M = ZC BMC B B i=1 ZC (13) B With ZC B is the weight(w i ), MC B is the local quality measure (m i )of each block and B is the total of blocks in the image. We perform our algorithm at two scales the original scale of the image, and at a reduced scale resolution, because images are naturally multi-scale and extracting multi-scale information produce better results in terms of correlation with human perception. A reduction of the resolution is applied to the original image using a low pass filter and down sampling by factor of 2. Then the final score is obtained as the average original scale image score and its reduced version score. final score = score1 + score2 2 (14) Where score1 and score2 are the scores obtained from the original scale and the reduced scale respectively. III. EXPERIMENTAL RESULTS We have tested the proposed measure on the LIVE IQA database [12], which consists in 29 reference images and 779 distorted images, with the following distortion types: JPEG2000 (JP2K) and JPEG compression, additive white Gaussian noise (WN), Gaussian blur (Blur), and a Rayleigh fast-fading channel simulation (FF). Because the proposed measure treats the distortions introduced by the JPEG2000, the evaluation is performed only on the 169 images compressed by JPEG2000 standard. The subjective scores are in terms of Difference Mean Opinion Score (DMOS), they are scaled from 0 to 100, where the higher value of DMOS corresponds to the worse quality. A second database have been used to evaluate the performance of the proposed metric that is the IVC database[13], it contains 10 original images, where from 4 different distortions (JPEG, JPEG2000, LAR coding and Blurring) they have generated 235 distorted images. We use only images that are corrupted by JPEG2000 (50 images). Two operators are used to assess the quality performance, the Pearson linear correlation coefficient (CC) and the spearman rank ordered correlation coefficient (SROCC), but before calculating the CC the scores are passed through a logistic non linearity function as mentioned in [12]. Quality(x) = β 1 logistic(β 2, (x β 3 )) + β 4 x + β 5 (15) logistic(τ, x) = (1 + exp(τx)) (16) TABLE I. CC AND ROCC OVER DIFFERENT BLOCK SIZES USING LIVE DATABASE Block size CC ROCC 5 0,908 0, ,924 0, ,917 0, IT4OD 2014

204 The obtained results are tabulated in Table I, where different block sizes are used. It can be seen that the performance of the proposed measure change with the block size. When the block size is decreased the performance will be questionable, but in the inverse case it will be stable. TABLE II. COMPARISON OF THE PROPOSED MEASURE WITH PSNR AND SSIM USING LIVE DATABASE Measures CC ROCC PSNR 0,903 0,899 SSIM 0,969 0,963 Proposed Measure 0,924 0,916 TABLE III. COMPARISON OF THE PROPOSED MEASURE WITH PSNR AND SSIM USING IVC DATABASE Measures CC ROCC PSNR 0,847 0,850 SSIM 0,932 0,929 Proposed Measure 0,905 0,897 Fig. 3. plot of DMOS versus quality scores computed by the proposed measure (LIVE database) From Table II, it is very clear that the proposed measure performs better than the full reference PSNR method, the SSIM measure gives results that are better than those of the proposed measure, but here we must keep in mind that the the proposed measure is no reference. In Table III results obtained from testing conducted on the IVC database are presented, the proposed measure outperform the PSNR metric, while results are competitive to those obtained from the SSIM measure. Now return above to the choice of our characteristics, the choice of the monotone changing MC activity can be justified by its sensitivity to the monotone intensity change, as the monotone intensity change the spread of the MC change, it increases and decreases, and this can be related to the presence of blur. When the spread is long it means that the edges tend to be thin, which is equivalent to the presence of Blur and vice versa. For this reason MC activity is typical to measure the presence of blur. Now for the ringing artefacts, it is well known that ringing occur around contours and produce ZC activity reflect the structural information of these contours. For this ZC is adopted to measure the ringing artifacts. IV. CONCLUSION Fig. 4. plot of DMOS versus quality scores computed by the proposed measure (IVC database) It is very clear that the block size of 8 gives good results. To show the performance of our metric results over other metrics we have chosen two full reference metrics for comparison, the first is the PSNR (peak signal to noise ratio) and the second is the SSIM (Structural Similarity Index Measure). Table II and III summarising the the obtained results, where the block size of the proposed measure is equal to 8, while Fig.3 and Fig.4 present a plot of DMOS versus quality scores computed by the proposed measure. In Fig.3 (LIVE) and Fig.4 (IVC), it can be seen that most points are closed to the fitted logistic curve, which means that the proposed measure satisfies the prediction of DMOS for the tested images. In this work a No-reference IQA measure for the JPEG2000 compressed images has been proposed. It is based on the natural scene statistics. These statistics are used to calculate the monotone changing and the zero crossing activities. A spacial pooling approach has been used to get the global score. As natural images are multi scale, all calculations are conducted over two scales. Consequently, the obtained results are satisfying, they are in agreement with human judgements. Future works should be focused on extending this measure to other types of image degradation. REFERENCES [1] P. Marziliano and F. Dufaux, Perceptual blur and ringing metrics: application to JPEG2000, r 2003 Elsevier 3rd ed. Harlow, England: Addison-Wesley, [2] H. R. Sheikh and A. C. Bovik and L. Cormack, No-Reference Quality Assessment Using Natural Scene Statistics: JPEG2000, IEEE transactions on image processing, vol. 14, NO. 11, November [3] H.Liuand and N. Klomp, and I. Heynderickx, A No-Reference Metric for Perceived Ringing Artifacts in Images, IEEE transactions on circuits and systemes for video technology, vol. 20, no. 4, April IT4OD 2014

205 [4] H. Liu et al, No-Reference Image Quality Assessment Based on Localized Gradient Statistics: Application to JPEG and JPEG2000, Human Vision and Electronic Imaging XV, edited by Bernice E. Rogowitz, Thrasyvoulos N. Pappas Proc. of SPIE-IST Electronic Imaging, SPIE vol. 7527, 75271F [5] J.Zhang and Thinh M. Le, A New No-Reference Quality Metric for JPEG2000 Images,IEEE transactions on consumer electronics, Vol. 56, No. 2, May [6] D.Rudermant, The statistics of natural images,netw. Comput. Neural Syst, vol.5, no.4, pp , [7] Z.Wang and X.Shang, Spatial pooling strategies for perceptual image quality assessment, ICIP, [8] A.Mittal, A.Krishna Moorthy, and A.Conrad Bovik, No-Reference Image Quality Assessment in the Spatial Domain,IEEE Trans.Image Process., vol. 21, no. 12, December [9] Rafael C. Gonzalez and Richard E. Woods, Digital Image Processing, 2002 by Prentice-Hall, New Jersey. [10] M. Saad, A. C. Bovik, and C. Charrier, Blind image quality assessment:a natural scene statistics approach in the DCT domain,ieee Trans.Image Process., vol. 21, no. 8, pp , Aug [11] Z. Wang, E. P. Simoncelli, and A. C. Bovik, Multiscale structural similarity for image quality assessmen,in Proc. Asilomar Conf. Signals,Syst. Comput., vol , pp [12] H. R. Sheikh, M. F. Sabir, and A. C. Bovik, A statistical evaluation of recent full reference image quality assessment algorithms,ieee Trans.Image Process., vol. 15, no. 11, pp , Nov [13] A. Ninassi, P. Le Callet, F. Autrusseau,Pseudo No Reference image quality metric using perceptual data hiding, in SPIE Human Vision and Electronic Imaging, vol , San Jose, CA, USA, January IT4OD 2014

206 A SYMMETRIC IMAGE ENCRYPTION SCHEME BASED ON DIAGONAL MATRICES AND THE XOR OPERATION Assia Beloucif Department of Computer Sciences University of Batna Batna, Algeria Lemnouar Noui Department of Mathematics University of Batna Batna, Algeria Abstract In order to protect numerical images against unauthorized access, cryptography is the main solution which conceives to alleviate such problem. Due to the redundancy and correlation of the image information, conventional encryption methods designed for textual data are not appropriate for image encryption. In this paper we propose a simple fast and secure scheme for image encryption based on the use of sparse matrices according to diagonal matrices and the XOR operation. We describe our algorithm in detail and show its effectiveness, robustness and its resistance to statistical analysis, entropy analysis, key sensitivity attacks and known/chosen plain text attacks. Moreover the proposed method possesses large key space and resists brute force attacks and the implementation of the corresponding algorithm is easy and only integers are used. Index Terms Image encryption, confidentiality, Hill cipher, matrix transformation. I. INTRODUCTION Nowadays, the exchange of information across the internet has become an essential component in modern society, particularly with the enormous growth of the information technology network. An invaluable type of information that is typically involved in modern communication is that of numerical images which are used in several sensitive domains such as electronic commerce, military affairs and medical records. In order to protect sensitive information against unauthorized access, when they are stoked or transmitted across an insecure network, encryption is the main solution conceives to alleviate such feature. Due to the redundancy and correlation of the image information, conventional encryption methods designed for textual data like RSA [1], the Data Encryption Standard (DES) and the Advanced Encryption Standard (AES) are not appropriate for image encryption. Hence special interest is needed while encrypting such data [2].According to Shannon [3] confusion (substitution) and diffusion (permutation) are the two primary methods to overcast such high redundancies and strong correlation. The Hill cipher [4], [5] is a well known symmetric and polygraph cipher based on the modular multiplication of matrices, where the key is a square and invertible matrix. If a bloc x of size n is the original text, and a square matrix K of size n n is the secret key for which the determinant is prime with m. The substitution operation in Hill is to have a left multiplication modulo m of the bloc x with the square matrix K. In the decryption operation the ciphered text is multiplied with the inverse matrix K -1 of the key K. Hill cipher can hide frequency information, thus it can resist statistical analysis, it has a large key space and it can resist brute-force attacks [6]. It is strong against a ciphertext-only attack, but it is easily broken with a known plaintext attack [7]. However the need to the generation of a large invertible key-matrix and calculate its inverse is another obstacle against the use of this cipher in practice. To overcome these problems Sadeenia [8] proposed a method which requires the exchange of a master key matrix only once, which reduces the problem of computing inverse matrices, and tries to prevent known plaintext attacks by using a dynamic key matrix obtained by random permutations of columns and rows of the master key matrix. However it is still vulnerable to known-plaintext attacks [9]. Another Hill modification is proposed in [10], is based on the modification of each row of the matrix key by multiplying the current key by an initial vector, however the proposed algorithm is vulnerable to known/chosen plain-text attacks [11].In reference [12] the authors suggested the use of self-invertible matrices in a modified Hill encryption which resists chosen plaintext attacks. However the proposed scheme is slow. Another modification of the Hill cipher based on the use of a circulant matrix as a secret key and a non-singular matrix as a public key is proposed in [13]. However we have find that it is clear that this proposed scheme is vulnerable to known-plaintext attack with only one chosen plain-text by choosing the plain image of allzero. Another Hill cipher modification based on pseudorandom-eigen Values have been proposed recently in [14]. The security of the scheme against the known/chosen plain text attacks is based on the use different key, for the encryption of each plaintext. Another category of image encryption is based on using the advantages of chaotic systems which have high amount of non- 200 IT4OD 2014

207 linearity. Arnold cat map is often used to shuffle the positions of the pixels in many cryptosystems [15], [16], but the map has two weaknesses, one is that the iteration times are very limited, usually less than 1000 times. Many other chaos-based image encryption algorithms have been proven to have the security weakness by cryptanalysis [17], [18], [19], and [20]. For example, they are incapable to resist the chosen-plaintext attacks. In this paper we propose a simple and fast algorithm for image encryption; our proposed algorithm is based on the use of matrix transformation and the XOR operation. We perform some security analysis to prove that the key space of the new algorithm is sufficiently large thus making the brute-force attack infeasible. Simulation results illustrate that suitable performances are achievable in our proposed algorithm. Results show that our image encryption algorithm out performs current image encryption algorithms in terms of security, sensitivity and speed. Hence, based on the achieved results, we can highly state that the proposed scheme best matches the real-time image encryption and transmission applications. The rest of this paper is organized as follows. Section 2 presents the basic concepts. In Section 3 we give an overview of the proposed algorithm. In Section 4 we discuss and analyze the security enhancement of the proposed solution, we conclude the paper in Section 5. II. BASIC CONCEPTS A. Diagonal matrix Let D (D 1, D 2,,D n ) be an n n matrix with entries in a field K. D is defined as diagonal matrix if the diagonal entries D(i, i) are arbitrary chosen as D 1, D 2,,D n, otherwise D(I, j) = 0. Thus a diagonal matrix can be written as: B. Inverse of a diagonal matrix Since a matrix is invertible if its determinant is not null, and the determinant of a diagonal matrix is the multiplication of the diagonal entries, then its sufficient to verify that none of the diagonal elements are zero. And D -1 (D 1, D 2,,D n ) the inverse of D (D 1, D 2,,D n ), can be written as: III. PROPOSED ALGORITHM The detailed encryption algorithm is described below and the block diagram for the encryption process is presented in Fig.1. Step.1 Firstly, a diagonal matrix D(S) and its inverse D -1 (S) are constructed, where S (S 1, S 2,,S n ) of size n is a preferably random secret-key vector (having all entries odd). Step.2 Let I be the original image of size n n. Compute the pixel gray value in the cipher-image by a two point diffusion transmission: Here the symbol represents the exclusive OR operation bit by- bit. Step.3 The obtained image Í is multiplied left by the diagonal matrix D(S) and right by the matrix D -1 (S). Step.4 Repeat step.2 one time using the obtained matrix C IV. PERFORMANCE ANALYSIS A gray image Lena with pixels is regarded as an original image. Numerical simulations have been performed to verify the performance of the proposed encryption scheme. Moreover four comparable cryptosystems are also investigated, and the performance of these five image encryption schemes is compared. Figure 3 shows that the proposed algorithm can encrypt images that contain large areas of a single color. A. Key space and sensivity analysis In the proposed cryptosystem, the vector S of size 256 is used as secret key, where is the size of the square original-image, and each odd value, of the vector, is coded on 7 bits. Then the key space can reach It is large enough to make any brute force attack ineffective. In another side a robust image encryption scheme should be sensitive to the cipher key. To evaluate the key sensitivity analysis, tow slightly different keys are used in the test where the difference between them is 1 and they are used to encrypt the plain-image Lena using the different schemes. The two ciphered images, encrypted by the two slightly different keys, are compared in table 1. There is a 99.62% difference between the two cipherimages. Moreover if a slightly modified key is used to decrypt the cipher-image, the decryption fails completely. The test results are shown in Fig.3. B. Differential attack To find out the relationship between the original image and the encrypted image an opponent makes a slight change of plaintext in order to test the influence of changing a single pixel in the original image on the encrypted image. This kind of attack is called differential attack. In order to test the (1) (2) 201 IT4OD 2014

208 influence of a one-pixel change on the cipher-image, two common quantitative measures are used: Number of Pixels Change Rate (NPCR) and Unified Average Changing Intensity (UACI) [21]. They are defined as follows: (3) (4) Difference (%) Fig. 1. The block diagram for the proposed algorithm. Fig. 2. Original images (a, c) and the corresponding encrypted images (b, d) using the proposed algorithm. TABLE I. KEY SENSITIVITY ANALYSIS USING THE DIFFERENT SCHEMES Hill[4] Saeednia[8] Ismail[10] Reddy[13] Proposed algorithm Here W and H represent the width and the height of the image respectively. C 1 (i, j)and C 2 (i, j)are respectively the encrypted images before and after one pixel of the plain image is changed. For the pixel at position (i, j), if C 1 (i, j) C2(i, j), let f(i, j) = 1; else let f(i, j) = 0. Two plain-images are used in the tests. The first image is the original plain-image Lena, and the other is obtained by changing the value of the last pixel with a difference of 1. Then the two images are encrypted with the same key to generate the corresponding cipher-images C 1 and C 2, and average NPCR and UACI values, using the five algorithms, are listed in Tab. 2. As can be seen from the simulation results, the proposed algorithm achieves a high performance and the proposed algorithm can resist the differential attack, however the other schemes cannot resist them. C. Statistical analysis 1) Histogram analysis An image histogram illustrates the distribution of the pixels values in an image. The result of the encryption of the image Lena with the proposed algorithm is performed, and then plots the histograms of plain-image and cipher-image as shown in Fig.3 (b) and (d), respectively. The histogram of the cipheredimage show the distribution uniform of the pixels value after the encryption hence it does not provide any useful information for the opponent. 2) Correlation between adjacent pixels analysis For a usual image, each pixel is highly correlated with its adjacent pixels. An ideal encryption technique should produce the cipher images with no such correlation in the adjacent pixels. In this section, correlation coefficient of two adjacent pixels in original image and encrypted image is studied. We have analyzed the correlation between two vertically adjacent pixels, two horizontally adjacent pixels, and two diagonally adjacent pixels pairs of two adjacent pixels (in horizontal, vertical, and diagonal direction) from plain-image Lena and its cipher-image were randomly selected and the correlation coefficients were calculated using the following equations: (5) 202 IT4OD 2014

209 NPCR (%) UACI (%) Fig. 3. Histogram analysis. (a) histogram of Lena and (b) histogram of ciphered Lena. TABLE II. NPCR AND UACI VALUES USING THE DIFFERENT SCHEMES Hill[4] Saeednia[8] Ismail[10] Reddy[13] Proposed algorithm (6) (7) Fig. 4. Horizontally, diagonally and vertically correlations of two adjacent pixels in the plain-image and in the cipherimage: (a), (c), and (e) are for the plain-image; (b), (d), and (f) are for the cipher-image. TABLE I. CORRELATION COEFFICIENTS OF TWO ADJACENT PIXELS USING THE PROPOSED SCHEME Plain-image Cipher-image Horizontal Vertical Diagonal Where x and y are grey-level values of the two adjacent pixels in the image. Table 3 lists the correlation coefficients of the image Lena and its cipher-image. The results of the measured correlation coefficients of the plain-image are nearly 1 while those of the cipher-image are close to 0.This indicates that the proposed algorithm has successfully removed the correlation of adjacent pixels in the plain-image. Figure 4 shows the correlations of two diagonally horizontally and vertically adjacent pixels in the plain- image and cipher-image and it is clear that neighboring pixels in the cipher-image virtually have no correlation. D. Information entropy analysis Information entropy is the most important feature of randomness. Let m be the information source. The formula for calculating information entropy is: For a truly random source emitting 2 N symbols, the entropy is H(m) = N. Therefore, for a ciphered image with 256 gray levels, the entropy should ideally be H(m) = 8. The entropy value of the ciphered-image, using our proposed method, is So the system can resist entropy attacks. E. Resistance to known/chosen plain-text attacks Generally, chosen plain-text attack is the most powerful classical attack. If a cryptosystem can resist this attack, it can resist the other types of classical attacks. In the proposed algorithm the attacker cannot obtain useful information by encrypting some special images, like All-zeros image and the image whose pixels represent the identity matrix, since the encryption of a pixel is related to the plain-images. There- fore, the proposed scheme can effectively resist known plain- text (8) 203 IT4OD 2014

210 and chosen plain-text attacks described for the Hill cipher in [7], [22]. V. CONCLUSION In this paper a new fast, simple image encryption algorithm is proposed. The analysis shows that the proposed cryptosystem has a higher security against brute-force attacks due to an extremely large key space. Simulation results show the effectiveness and the security of our proposed scheme; against statistical analysis, brute force attack and key sensitivity attacks and differential analysis. Hence, based on the achieved results, we can highly state that the proposed scheme best matches the real-time image encryption and transmission applications. VI. REFERENCES [1] Ronald L Rivest, Adi Shamir, and Len Adleman, A method for obtaining digital signatures and public-key cryptosystems, Communications of the ACM, vol. 21, no. 2, pp , [2] Andreas Uhl and Andreas Pommer, Image and video encryption: from digital rights management to secured personal communication, vol. 15, Springer, [3] Claude E Shannon, Communication theory of secrecy systems*, Bell system technical journal, vol. 28, no. 4, pp , [4] Lester S Hill, Cryptography in an algebraic alphabet, The American Mathematical Monthly, vol. 36, no. 6, pp , [5] Lester S Hill, Concerning certain linear transformation apparatus of cryptography, The American Mathematical Monthly, vol. 38, no. 3, pp , [6] Jeffrey Overbey, William Traves, and Jerzy Wojdylo, On the keyspace of the hill cipher, Cryptologia, vol. 29, no. 1, pp , [7] William Stallings, Network and internetwork security: principles and practice, Prentice-Hall, Inc., [8] Shahrokh Saeednia, How to make the hill cipher secure, Cryptologia, vol. 24, no. 4, pp , [9] Chu-Hsing Lin, Chia-Yin Lee, and Chen-Yu Lee, Comments on saeednia s improved scheme for the hill cipher, Journal of the Chinese institute of engineers, vol. 27, no. 5, pp , [10] IA Ismail, Mohammed Amin, and Hossam Diab, How to repair the hill cipher, Journal of Zhejiang University SCIENCE A, vol. 7, no. 12, pp , [11] Y Rangel-Romero, R Vega-Garcıa, A Menchaca- Mendez, D Acoltzi-Cervantes, L Martınez-Ramos, M Mecate- Zambrano, F Montalvo-Lezama, J Barr on- Vidales, N Cortez-Duarte, and F Rodrıguez-Henrıquez, Comments on how to repair the hill cipher, Journal of Zhejiang University SCIENCE A, vol. 9, no. 2, pp , [12] Bibhudendra Acharya, Sambit Kumar Shukla, Saroj Kumar Panigrahy, Sarat Kumar Patra, and Ganapati Panda, Hsx cryptosystem and its application to image encryption, in Advances in Computing, Control, & Telecommunication Technologies, ACT 09. International Conference on. IEEE, 2009, pp [13] K Adinarayana Reddy, B Vishnuvardhan, AVN Krishna, et al., A modified hill cipher based on c irculant matrices, Procedia Technology, vol. 4, pp , [14] Ahmed Mahmoud and Alexander Chefranov, Hill cipher modification based on ps eudo-random eigenvalues., Applied Mathematics & Information Sciences, vol. 8, no. 2, [15] Medina Ablikim, ZH An, JZ Bai, Niklaus Berger, JM Bian, X Cai, GF Cao, XX Cao, JF Chang, C Chen, et al., Design and construction of the besiii detector, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 614, no. 3, pp , [16] Jun Wei, Xiaofeng Liao, Kwok-wo Wong, and Tao Xiang, A new chaotic cryptosystem, Chaos, Solutions & Fractals, vol. 30, no. 5, pp , [17] Rhouma Rhouma and Safya Belghith, Cryptanalysis of a new image encryption algorithm based on hyper-chaos, Physics Letters A, vol. 372, no. 38, pp , [18] Xin Ge, Fenlin Liu, Bin Lu, and Wei Wang, Cryptanalysis of a spatiotemporal chaotic image/video cryptosystem and its improved version, Physics Letters A, vol. 375, no. 5, pp , [19] Chengqing Li, Shujun Li, Guanrong Chen, and Wolf- gang A Halang, Cryptanalysis of an image encryption scheme based on a compound chaotic sequence, Image and Vision Computing, vol. 27, no. 8, pp , [20] Hong Liu and Yanbing Liu, Cryptanalyzing an image encryption scheme based on h ybrid chaotic system and cyclic elliptic curve, Optics & Laser Technology, vol. 56, pp , [21] CK Huang and HH Nien, Multi chaotic systems based pixel shuffle for image encryption, Optics Communications, vol. 282, no. 11, pp , [22] Craig Bauer and Katherine Millward, Cracking matrix encryption row by row, Cryptologia, vol. 31, no. 1, pp , IT4OD 2014

211 An Improved CUDA based Hybrid Metaheuristic for Fast Controller of an Evolutionary Robot Nour EL-Houda BENALIA * LESIA Laboratory Biskra University, Algeria Nesrine OUANNES LESIA Laboratory Biskra University, Algeria NourEddine DJEDI LESIA Laboratory Biskra University, Algeria Abstract Simulation of sophisticated biological models requires considerable computational power. Modern GPUs (Graphic Processor Units) enable widely affordable personal computers to carry out massively parallel computation tasks. These new devices opens up the possibility of developing more integrative, detailed and predictive biological models, while at the same time decreasing the computational cost to simulate those models. This work demonstrates that GPU algorithms represent a significant technological advance for the simulation of complex biological model. Our model is a hybridization of an EA (evolutionary algorithm) and RNN (recurrent neural network) that is developed and tested to generate optimal trajectories of an humanoid robot simulated using the library named ODE (Open Dynamic Engine), using GPU Accelerator at multiple levels, taking into consideration an effective utilization of GPU memory hierarchy, judicious management of parallelism, and some techniques of optimizations of GPU codes. The model was derived from our CPU serial implementation. This paper presents the implementation details of our proposal on a PC with 2 GTX 480 GPUs. Since EA and RNN are inherently parallel, the GPGPU computing paradigm leads to a promising speedup in the evolutionary phase with respect to the CPU based version. Index Terms Artificial life, Parallel Evolutionary Algorithms, Robotics, GPU. I. INTRODUCTION The recent advances of the new programming for graphical processing units (GPU) open up the possibility of developing more integrative, detailed and predictive bio-inspired and artificial life models while at the same time decreasing the computational cost to simulate those models. A simulation modeling is continuously evolving in terms of constraints and objectives, and their resolution takes many hours or even days to execute tends to inhibit the exploratory nature of modeling just due to the limits of available time. This is heightened by the fact that these co mplex models can also have many additional parameters which must be considered for their role in the beha vior of the model. For t hese reasons, considerable effort is put into technologies, methodologies and theoretical advances in order to speed up execut ion without sacrificing model accuracy [1][9]. Nowadays, many algorithms are rewritten and redesigned for modern graphics cards (GPUs) [2] which have a SIMD massive parallelism. On the other hand, the availability of new appropriate development environments tends to further simplify the development of parallel applications for this type of processors, which has been recently used to accelerate computations for various topics. Researchers in biom echanics, robotics, and c omputer science work to understand human natural motion in order to reproduce it in other forms. The ai m of humanoid robotic researchers is to obtain robots that can imitate the human behaviors to collaborate, in the best way, with humans. An obvious problem confronting humanoid robotics is the generation of stable and effi cient gaits at a reasonable time. Thus, in order to address this problem, alternative, biologically inspired control methods have been proposed in [3], which do not require the specification of reference trajectories. Researchers have exa mined alternative hardware A NN (Artificial Neural Network) solutions using FPGAs (Fieldprogrammable gate array) and GPUs that deliver a fine-grained parallel architecture. The authors in [4], highlight that FPGA approaches have several limitations such as inefficient area utilization and a Man hattan style interconnect. However, [5] indicates that the proble m of accessing memory in a c oherent manner and limited memory bandwidth, are major drawbacks for ANNs based on GPU platforms, but these drawbacks can be eliminated with the new architectures and optimization code techniques of NVIDIA as mentioned in the rest of paper. Our new system brings different contributions and salient issues are dealt with the acc eleration of an already proposed method [3] which is the combination of an EA and a RNN which composes the brain of our robot. Evolutionary computation is perfectly well suited for efficient execution on massively parallel GPGPU card s. This paper shows how to implement using GPU algorithms an im proved technique to accelerate the process of the controller s robot in GPUs that executes more times faster than conventional CPU co de. Furthermore, we provide a set of generic guidelines for GPU programming that allow modelers to take advantage of GPUs while avoiding many pitfalls associated with the architecture. The remaining of this paper is org anized as follows. Section 2 intr oduces GPU com puting using CUDA f ollowed by a descri ption of the type of t he neural network, the evolutionary algorithm used, t hen an e xplication about the sample of de monstration in Section 3. Our proposed and improved CUDA-based approach is detailed in Section 4 with experimental results described and analyzed in Section 5. And at last, Section 6 draws conclusions with some perspectives. 205 IT4OD 2014

212 II. BACKGROUND AND RELATED WORK A. CUDA Model Programming Nowadays, user friendly APIs suitable for general-purpose GPU computing have been developed in order t o reduce programming difficulties. NVIDIA s CUDA (Compute Unified Device Architecture) technology provides a parallel computing architecture on modern NVIDIA s GPUs on which hundreds of streaming processors (SPs) are grouped into several streaming multiprocessors (SMs) with own fl ow control and on -chip shared memory units. All S Ms share a g lobal memory with high latency. The number of SMs, the size of global memory, number of SPs, the size of shared memory and the number of registers depend on GPU s compute capability [7]. The sequential operations should be programmed as host functions that execute on t he CPU. Thus, pa rallelizable operations should be pr ogrammed as a kernel or device functions that execute on t he GPU. Both host and kernel functions will be wrapped and called via a main host function. In CUDA environment, the basic unit of a parallel code on CUDA is a thread, thousands of threads can run concurrently with a same instruction set. Each thread runs a sa me program called a kernel. A kernel can em ploy registers as fast access memory. The communication am ong threads ca n be performed with a shared m emory, which is a type of very fast memory that allows both read and write accesses. A set of threads can be bundled into a grid of thread blocks with each block containing a fi xed number of threads. The blocks grid can have up t o three dimensions. Its size is denoted by a predefined struct variable griddim with three fields x, y and z storing the block numbers in three dimensions respectively. Each block in a gri d can be i ndexed by a predefined struct variable blockidx with three fields x, y and z storing the position of the corresponding block in the grid. The block can have up to three dimensions of threads with its size denoted by a predefined struct variable blockdim having three fields x, y and z storing the thread numbers in three dimensions respectively. Each thread in a bl ock can be i ndexed by a predefined struct variable threadidx with three fields x, y and z storing the position of the corresponding thread in the block. The communication between CPU and G PU can be do ne through global device memory, constant memory, or t exture memory on a GPU b oard. After compilation by the CUDA environment, a program runs as a kernel in a GPU. A kernel takes input parameters, conducts computations, and outputs the results to device memory where the result can be read by the CPU. Each thread must perform the same operation in the kernel, but the input data can be di fferent. With thousands of threads doing similar tasks simultaneously, the computation speed can be significantly improved. The CPU owns the host code that pre pares input data and accepts output values from the GPU. Th e intensive computation task is h andled by GPU kernels. The output data is written to global device memory in order to be ret rieved by a CPU program. For ful ly comprehensive description, the reader is referred to CUDA C Programming Guide [17] and similar resources. B. ANN: Serial vs parallel GPU-based ANN has been part of an atte mpt to emulate the learning phase of the human nervous system. However, the main difference arises from the f act that the nervous syste m is massively parallel [10] while the computer processor remains significantly sequential. Since ANN requires a considerable number of vector and matrix operations to get results, it is very suitable to be im plemented in a paral lel programming model and run on GPUs. But few studies have appl ied CUDA to neural networks as this subject requires further investigation in applying the CUDA techn ology to neural computation and robotics research [11][12]. Various types of neu ral networks are used to generate walking behaviors and control design of humanoid robots. The majority of the proposed control algorithms have been verified by simulation, while there were little experimental verification on real biped and hum anoid robots. Our system is based on simulation of a recurrent neural network (RNN) system benchmark using the evolutionary algorithm for l ocomotion behaviors, for more details [3]. Evolution strategy is used as an opt imization process to evolve neural networks that prove to be r obust solutions to difficult real-world learning tasks without the need t o supply additional information or f or an external agent to direct the process. For this task, we must exploit a big recurre nt neural network to connect to simulated muscles as a prop rioceptive and a motor system in a humanoid robot with tens of degrees of freedom. The neu ral network must possess m ore than thousands of neurons because it is easier to extract and set the correct information for each join t [12]. More details about the parts of the RNN or of the ES which will be on the GPU will be in the next sections. C. GPU-based Evolution Strategy (ES) as an ANN Training Process: Serial vs parallel GPU-based EAs are ideal for their use on a parallel computing platform due to a multitude of reasons [13]: Since the evaluation of individuals is an independent event, the evaluation task can be performed in a pa rallel manner. Most EA operators involve one or two solutions, thereby making them ideal for a parallel implementation. ES is one the most popular methods of the EA. In our case, it uses pop ulations of ca ndidate solutions to simultaneously explore the search space of ANN parameters. Most works on the GPU are implemented with the CUDA development toolkit, which allows programming on GPUs in a more accessible C-like language. In their investigation, Arora et al. presented an im plementation based on CUDA toolkit for genetic algorithms [13]. The contribution associated with their work is to study the effect of a set of para meters (e.g. thread size, problem size or popul ation size) on t he performance of their GPU implementation in reference to a sequential genetic algorithm (GA). Pospichal et al. adopted a parallel GA with an island model for implementation running on GPU with 256 individuals in each island [14]. The authors' design based on mapping threads 206 IT4OD 2014

213 to individuals, and usi ng the on-chip hardware scheduler in order to rapid exchange existing islands between multiprocessors to hide memory latency and shared memory within the multiprocessor is used to maintain populations. The authors report speedups up to 7000 times higher on GPU compared to the CPU sequential version of the algorithm. Oiso et al. [15] proposed a t echnique exploiting the parallelism between individuals and ge nes simultaneously. The prop osed implementation method yielded approximately 18 times faster results than a CPU implementation of t heir benchmark tests. Kromer et al. [16] proposed a t echnique using the same analogy with the previous one, b ut for the differential evolution. With the help of the CUDA toolkit, the fitness of the schedules was evaluated fast er than on t he CPU using C code and times faster than on t he CPU using object oriented C++ code. D. Sample of Demonstration: Humanoid Robot Model and Data Storage Our demonstration is based on a model of robot that was created from primitives available in ODE sim ulation package, which deliver very controlled environment without obstacles. A physically-based model of bipedal locomotion describes the nonlinear relationships between the forces, the moments acting at ea ch joint, the feet, th e position, velocity and the acceleration of each j oint angle. In addition to the geo metrical data, a dynamic model requires kinematical data mass, center of mass and inertia matrix for each link a nd joint, max/min motor torques and joint velocities) which are difficult to obtain and are often an overlooked source of simulation inaccuracy. All these data are co pied into Main memory of the GPU in order to make the largest possible computation on it. To simulate interaction with the environment detection and handling of collisions as well as suitable models of foot ground contacts are required. For more information about the model of our legged robot, table 1 has more details. TABLE 1. BODY PARAMETERS OF THE ROBOT Body part Geometry primitive Dimension (m) Head Sphere Radius: Arm Caped cylinder 0.14 x 0.25 x 0.44 Torso Rectangular box 0.9 x 0.25 x 1.0 Thigh Caped cylinder 0.20 x 0.25 x 0.7 Shank Caped cylinder 0.20 x 0.25 x 0.7 Foot Rectangular box 0.4 x 0.5 x 0.1 III. GPU-EVOLUTIONARY ROBOT CONTROLLER DESIGN Over the past decade, physics-based simulation has become a key enabling technology for different applications. It has taken a front seat role in computer games, animation of virtual worlds and robotic simulation. Physics engines are considered one of the most important of a m ultitude of com ponents requesting CPU time. While some areas require high accuracy, others speed o f simulation is more important. To at tend this demand of ac celeration, some physics engines are already using GPU acceleration like rigid body colli sion detection and particle interaction. In our proposal, we adopt ed the GPGPU architecture proposed by Zamith and al [18], where the GPU is used as a mathematics and physics coprocessor. The humanoid robot presented in this paper uses Elman Recurrent Neural Network for its biological plausibility and powerful memory capabilities. Furthermore, biological neural networks do not make use of back pr opagation for l earning. Instead, we use EAs t o evolve locomotive behaviors. In this work, we exploit the parallelism at a h igher level, so we form groups of threads to handle a si ngle chromosome which corresponds to each robot. In this paper, we focus on the technical details about the implementation of the EA, Other details about RNN and ODE on GP U is out of sco pe of this work. Moreover, the initialization of the first population is done in parallel, which means that all th e RNNs are in itialized and activated in parallel. We should mention here that an efficient mapping of the population is very important because the occupation rate of t he graphics cards is a factor that directly affects the overall perform ance. In order to simplify kernel invocations, and saves memory bandwidth, EA parameters will be copied to the GPU constant memory. Our EA population is laid out in main memory of GPU, as a two dimensional N x L matrix such that columns refer to chromosomes and rows correspond to genes within chromosomes, where N is population size and L is chromosome length, taking into consideration that storing variables of o ne individual sequentially in an array does not permit efficient memory access, so, to rea ch the coalescing memory, variables of one t ype and owned t o different individuals of the population are stored adjacently in buffers. IV. TECHNICAL IMPLEMENTATION OF THE EVOLUTIONARY TRAINING PROCESS The key, for an efficient im plementation of the genetic manipulation kernel, is a low divergence and enough data to utilize all the CUDA cores. The purpose of the EA i s to optimize the weights of t he RNN which controls the humanoid robot. A sy nergistic relationship exists between the EA and t he RNN as shown in figure 1. The EA optimizes the RNN, and the RNN produces robot behavior that is then scored. At start-up, the population s chromosomes are initialized to random with 1 gene per RNN weight. The n umber of conn ections represents the number of genes in the chromosome; a fl oating-point number represents each gene. Our system is equipped with three modules as shown i n figure 2, where the first one is Simulation World Module which is responsible of simulation task i n ODE environment taking into consideration all physics laws required. The second one is Brain Module, which create the robot s brain using the RNN. The last one i s the principle module of this paper (Evolution Module), where we try to implement all the EA phases on the GPU as kernels. 207 IT4OD 2014

214 To find the best Parameter Configuration (Block shape, total number of threads) of our kernels, we apply an algorithm to find the best Parameter Configuration (Block shape, Total number of threads) of our kernels, we apply an al gorithm which measure the time of ea ch configuration to select at the end the most appropriate one and at the same time maximize the occupancy of the used card, analyze the performance of the algorithm is out of the scope of this work. Next subsections are dedicated to describe the CUDA kernels and the techniques used in the implementation phase. A. Random Number Generation One of the factors which affect the p erformance of evolutionary algorithms is the generation of random numbers because theses algorithms are stochastic search processes. However, CUDA libraries do not include any functions of a random number generator (RNG) at present, despite the fact that RNG is naturally necessary for executing GA processes. In order to generate random numbers in our application, we use Random123 library of cou nter-based random number generators (CBRNGs). The Random123 library can be used with the CPU (C and C++) and the GPU (CUDA and OpenCL) applications. This library is c hosen due to the fastness of its generator (three times faster than the standard C rand function and more than 10 tim es faster than the CUDA cur and generator [19]. B. Population Initialization Kernel As mentioned above, t he elements of each chr omosome represent the weights of the RNN, the operation of getting these values is fully parallelizable and all chromosomes are initialized at the same time. This phase is performed by block of thread for each chromosome, as shown in figure 3. Sensors CPU Start ODE RNN EA Terminate? End Effectors Algorithm of Optimization Transfers Kernels PCI- Express Solutions Algorithm of Threads Distribution GPU Physic simulation & EA EA Parameters Initialisation (Constant Memory) Init & copy of rigid bodies on GPU (Global Memory) Kernel functions RNN Activation Function kernel Fitness Kernel Evolutionary Process Replacement Kernel Statistical Kernel Fig.1. Configuration of RNN and EA with ODE on GPUs. Simulation World GPU_Brain Brain Neural Net GPU_Neuron GPU_Organism GPU_Humanoid GPU_Evolution GPU_Parameters GPU_ Individuals GPU_Statistics Fig.2. The main classes of the proposed model. Evolution C. Selection and Evolutionary process kernel Based on t he fitness values of the population members, population chromosomes will be sorted. We adopted two types of selection process Roulette Wheel selection and tournament selection performed by one thread in a block. The crossover kernel used is a one poi nt crossover. These necessary exchanges between the two individuals. So, the number of bl ocks in the kernel is (PopSize/2). Each bl ock is kernel performs this process with two individuals to make the divided into two dimensions. The x di mension represents the genes of one chromosome, while the dimension Y re presents the population of chromosomes. As t he previous one, the kernel allocates an individual to a block and gene to a thread. The same decomposition is used, so each thread corresponds to a gene and decides if this gene will be mutated or not. D. Fitness function kernel The fitness function is based on t he distance travelled by the robot within a fixed time period. The best one is the robot who is able to travel the largest distances in a given time. The same decomposition as t he evolutionary process i s used, where we copy each block to shared memory, in order to calculate the distance in the GP U. To do that, we need to transfer a buffer of the ini tial positions and the reached positions for the whole population to the global memory. E. Replacement kernel This phase uses Roulette Wheel selection over the parents and offspring to create the new parent population. The kernel parameters used is th e same as the previous phases with one modification which is that the dimensions of t he kernel are derived from the parent population size. F. Statistics kernel The last kernel in the evolutionary algorithm is that of statistics. The statistics of the population is used for t he selection process and for t he termination of the decision. The maximum and minimum values of the fitness, the average and the deviation constitute the structure of the statistical data. 208 IT4OD 2014

215 Fig.3. Block threads chromosome mapping. V. EXPERIMENTS AND DISCUSSION We first pres ent the co mputational performance of our algorithms for controller training, followed by the estimated evolution performance of the brain evolution. The first goal of this study is to demonstrate the potential of using GPU devices for an Evol utionary Algorithm used t o evolve a recurrent neural network, which composes the brain of the humanoid robots simulated by the ODE physics simulator. A. Experimental Setup and problem configuration To evaluate the perform ance of our syste m, we compared the execution time of a seque ntial algorithm on a commodity CPU and an optimized algorith m on comm odity GPUs. Our objective was to estimate the speedup of our algorithm with respect to sequential without using the CPU's multiprocessing capabilities. Experimental hardware is shown in Table 2. The present demonstration of the GPU-CPU differences in performance is to have a general idea to which GPUs outperforms CPUs. The res ults of t his Implementation are compared to the sequential one. The algorithms implemented using C/C++ and C UDA (4.0) and run experiments on t he Fermi architecture (GTX480). B. Performance Results Figure 4 shows the experimental results of t he GPU and CPU implementations on different phases of t he EA. The performance of this graph is the result of 25 trials. GPU with the proposed implementation method yielded a speedu p ratio of ,36 times compared to the CPU im plementation method. It is notable that parallelizing the process of both chromosomes and their genes is more effective. This is due to the fact that the implementation enables the execution of more threads, and the execution of the most EA process can suppress the frequency of data transfer between the host and the device, which is the biggest challenge in any application on the GPU. TABLE 2. Environment of experimentation. Legend Label Hardware Clock Free RAM OS X86 Intel Core (TM) i GHz 8.00 Go Windows7 GTX480 2 NVIDIA GTX MHz 4.00 Go Windows7 Fig.4. CPU vs GPU time of EA phases. After discussing the difference between the results of the GPU and CPU i mplementations, we will see the influence of some parameters on the entire perform ance. The occupation time and memory (shared memory or registers) of the Distance function is longer than the other Kernels in the two cases (n = 512) and (n = 1024), the reason behind this phenomenon is that the elements of calculating the Fitness kernel are coming from the physic simulation performed by The ODE. Figure 5 sh ows the experimental results of t he GPU and CPU implementations on different generation of the EA. The global performance of our syst em increases, according to the augmentation of the number of individuals in the totality of the population. 1) Performance discussion in Terms of Efficiency and Effectiveness For each step of the simulation on the physics sim ulator, a trend mono-core CPU implementation, a C PU-GPU, and a CPU-GPU version using constant memory for savi ng ES parameters are considered. The average time has been measured for 25 trials. A verage values of the evaluation function have been collected and the number of successful tries is also represented. Since the computational time is exaggerated to 100 generations and more, the average expected time for the CPU implementation has been deduced fr om the base of two executions. Generate and evaluate the chromosomes in parallel on GPU provides an efficient way to speed up the evolution and the search process in comparison with a single CPU. As shown in the figure, the GPU version has been already faster than the CPU one (order of accelerati on of 9X). As long as the population size increases, remarkable speedup is shown. Due to high misa ligned accesses to global memories, noncoalescing memory reduces the perform ance of the GPU implementation. To overcome the problem and to reach the coalescing memory, variables of one t ype and ow ned to different individuals of the population are stored a djacently in buffers. GPU keeps accelerati ng the hybrid evolutionary process as long as the size increases. Regarding the quality of solutions, in comparison with the sequential one, the obtained results by the proposed hybrid ES is quite competitive. The conclusion from this experiment indicates that the use of GPU provides an efficien t way to deal with this kind of application where a lot of para meters are considere d (ex. simulator parameters, RNN parameters, and ES parameters). 209 IT4OD 2014

216 Fig.5. Global system execution on CPU and GPU. Fig.6. Global system execution speedup So, implementing training on a GPU has allowed to exploit parallelism in such application improve the robustness/quality of provides solutions, and as this problem takes into account a bench of parameters, the order of acceleration achieved is very promising to go t hrow the optimization methods (using streams, ), tuning the parameters of t he EA o n the GPU device, using the multi-gpu, etc. VI. CONCLUSIONS AND FUTURE WORK The first o bjective of this study is to demonstrate the potential of using GPU devices for an Evolutionary Algorithm used to evolve a recurrent neural network which composes the brain of the humanoid robots simulated by a physic simulator. Meanwhile, after analyzing the problem, we found that it will be more suitable if we implement some parts of the physic simulator and the RNN on the GPU. This work has successfully performed the steps cited by implementing an optimized brain (c ontroller) that can distribute some of i ts processing between GPU and CPU, allowing the developer of an automatic mode to decide where to process the computations. As future work, we could improve the accuracy of our techniqu e by incorporating tuning parameters kernel methods and t esting another ANNs with other evolutionary training methods. Additionally, parallelization techniques of evolutionary algorithms involving multiple populations [8] may interact favorably with such type of applications. Separate populations (GPU-based evolutionary algorithm island model) may be trained on different subsets of the training data, allowing a more methodical search of the solution space presented by the training set. VII. REFERENCES [1] S. Christely, B. Lee, X. D ai and Q. Nie. Integr ative multicellular biological modeling: a case st udy of 3D epidermal development using GPU algorithms. In BMC systems biology Journal, Vol. 4, [2] J. Liu, and L. Guo. Implementation of N eural Network Backpropagation in CUDA. In Ad vances in Intelligent Syste ms and Computing. Springer-Verlag, pp , [3] N. Ouannes, N. Djedi, H. Luga, and Y.Duthen. Gait Evolution for Humanoid Robot i n a Physically Simulated Environment. In Intelligent Computer Graphics 2011, Studies in Computational Intelligence, Di mitri Plemenos Georgios Miaoulis (Eds.), Springer-Verlag, pp , [4] L.P. Magure, T.M. McGinnity, B.Glackin, A.Ghani, A.Belatreche, and J. Harkin. Challenges f or large-scale implementations of spiking neural networks on FPGAs. In Neurocompute Journal.Vol 71, pp 13-29, [5] J.M. Nageswaran, N. Dutt, J.L. Krichmar, A.Nicolau, A. Veidenbaum. Efficient simulation of large scal e Spiking Neur al Networks using CUDA graphics processors. In IEEE International joint conference on Neural Networks, pp , Los Alamitos, [6] T. V. Luong, N. Melab and E. Talbi. Parallel hybrid evolutionary algorithms on GPU, In IEEE Congress on Evolutionary Computation (CEC), Barcelone, Spain, [7] A. K. Qin, F. Raimondo, F. Forbes and Y. Soon Ong. An improved CUDA-based implementation of differential evolution on GPU. In Proceedings of the 14th annual conference on Genetic and evolutionary computation. pp , USA, [8] J. Jaros. Multi-GPU island-based genetic algorithm for solving the knapsack problem. In Proceeding of IEEE Congress of Evolutionary Computation (CEC), Brisbane, Australia, [9] Y. Yang, P. Xiang, J. Kong, M. Mantor and H. Zhou. A unified optimizing compiler framework for different GPGPU architectures. In ACM Transactions on Architectur e and Co de Optimization (TACO), Vol 9, Issue 2, June [10] R. D Pr abhu. GNeuron: Parallel Neural Networks, In 14th I EEE International Conference on High Per formance Computing ( HiPC 2007), Goa, India, [11] M. Peniak, A. Morse, C. Larcombe, and S. Ramirez-Contla. Aquila: An Open-Source GPU-Accelerated Toolkit for Cognitive and Neuro- Robotics Research, July 31 -Aug , San Jose, CA, [12] P. G-Nalda and B. B-Cases. Topos 2: Spiking Neur al Networks for Bipedal Walking in Humanoid Robots. In Hybrid Artificial Intelligent Systems Book. Proceedings, Part I, The 6th International Conference, HAIS, May 23-25, Wroclaw, Poland, [13] A. Arora, R. Tulshyan, and K. Deb. Parallelization of binary and realcoded genetic algorithms on GPU using CUDA. In IEEE Congress on Evolutionary Computation, pp. 1-8, [14] P. Pospichal, J. Jaros and J. Schwarz. Parallel genetic algorithm on the CUDA architecture. In Proceedings of the 2010 international conference on Applications of E volutionary Computation - Vol. Part I. Springer-Verlag, Berlin, Heidelberg, pp , [15] M. Oiso, Y. M atsumura, T. Yasuda and K. Ohkura. Implementing Genetic Algorithms to CUDA E nvironment Using Data Parallelization. In Japan T echnical Gazette, Vol 18 No. 4, pp , [16] P. Krömer, V. S nåšel, J. Platoš, and A. Abraha m. Many-threaded implementation of differential evolution f or the CUDA platf orm. In Proceeding GECCO '11 Proceedings of the 13th annual conference on Genetic and evolutionary computation, pp , [17] NVIDIA: CUDA Toolkit 4. 0 CURAND Guide, [18] M. P. M. Zamith, E. W. G. Clua, A. Conci, A. Montenegro, R. C. P. Leal-Toledo, P. A,. Pagliosa a nd L. Valente. A Gam e Loop Architecture for the GPU Used as a Math Coprocessor in Real-Ti me Applications. In Magazine Co mputers in Entertain ment (CIE) - SPECIAL ISSUE: Media Arts, Vol.6, [19] J. K. Sal mon, M. A. Mor aes, R. O. Dror and D. E. Shaw. Parallel Random Numbers: As Easy as 1, 2, 3. In Proceedings of International Conference for High Per formance Computing, Networking, Storage and Analysis, SC ACM Press, New York, USA, IT4OD 2014

217 Face Recognition Using Local Binary Patterns in One Dimensionnal Space and Wavelets Amir Benzaoui and Abdelhani Boukrouche Laboratory of Inverse Problems, Modeling, Information and Systems (PI: MIS) Department of Electronics and Telecommunication University of 08 MAI 1945, P.O box 401, Guelma 24000, Algeria Abstract The popular Local binary patterns (LBP) have been highly successful in describing and recognizing faces. However, the original LBP has several limitations which must to be optimized in order to improve its performances to make it suitable for the needs of different types of problems. In this paper, we investigate a new local textural descriptor for automated human identification using 2D facial imaging, this descriptor named One Dimensional Local Binary Patterns (1DLBP), produces binary code and inspired from classical LBP. The performances of the textural descriptor have been improved by the introduction of the wavelets in order to reduce the dimensionalities of the resulting vectors without losing information. The 1DLBP descriptor is assessed in comparison to the classical and the extended versions of the LBP descriptor. The experimental results applied on two publically datasets, which are the ORL and AR databases, show that this proposed approach of feature extraction, based on 1DLBP descriptor, given very significant improvements at the recognition rates, superiority in comparison to the state of the art, and a good effectiveness in the unconstrained cases. Keywords biometrics; face recognition; Local Binary Patterns (LBP), One Dimensionnal Local Binary Patterns (1DLBP), Wavelets. I. INTRODUCTION Biometric systems have increasingly becoming important tool in the information and public security domains; this is because they provide an automatic identification or verification of the identity based on the analysis of physical or behavioral modalities of the human body. Several modalities have been used to recognize the human identity; we can cite fingerprint, voice, iris, palm-print, retina, computer keyboards, or signature [1, 2] Especially, the automatic analysis of the human face has become an active research area in the artificial vision and patterns recognition domains, due to its important use in several applications such as electronic election, biometrics, forensics and video surveillance. The human face is dynamic entity, which changes under the influence of several factors as pose, size, occlusion, background complexity, lighting and the presence of some components such as mustaches, beard, and glasses. So, the essential key for any facial analysis problem is on how to find an efficient descriptor to represent and to model the face in a real context? The crucial step in any problem of face analysis is the phase of features extraction. In this phase, there are two major approaches, local and global approaches. Psychological and neuroscience studies have proved that the human visual system combines between local and global features to differentiate between persons [3]. Global approaches are based on pixel information; all pixels of the facial image are treated as a single vector; the vector size is the total number of the image s pixels [4]. Most of the methods of this approach use another space of representation (subspace) to reduce the number of pixels and to eliminate the redundancies. Principal Component Analysis (PCA) [5], Linear Discernment Analysis (LDA) [6] and Independent Component Analysis (ICA) [7] are the most popular methods used to reduce the dimensions and to select the useful information. However, these approaches are not effective in the unconstrained cases, i.e., situation where occlusion, lighting, pose, and size of the face are uncontrolled. Recently, the scientists concentrate on local approaches, which are considered as a robust approaches in the unconstrained cases compared with global approaches; in this case, the face analysis is given by the individual description of its parts and their relationships, this model corresponds to the manner of perception by the human visual system. The methods of this approach are based on the extraction of features from the facial image and the definition of an adequate model to represent the face [4]. Several methods and strategies have been proposed to model and classify faces essentially based on textures, normalized distances, angles and relations between eyes, mouth, nose and edge of the face. Local Binary Pattern (LBP) [8], Local Gabor Binary Pattern 211 IT4OD 2014

218 (LGBP) [9] and Oriented Edge Magnitudes (POEM) [10] are the recent methods in this approach. The specific contributions of this paper are: An automated biometric system using 2D face imaging is developed. This work is the continuation of our previous works [11, 12] and the objective is to improve the performances of recognition in the unconstrained situations. A new textural descriptor is proposed, this descriptor called: One Dimensional Local Binary Pattern (1DLBP). It is essentially inspired from the classical LBP, and projected in one dimensional space. The developed approach of feature extraction, based on 1DLBP, is characterized by a combination of the local and global features to analyze, describe, differentiate and recognize persons by using their faces. The performances of the realized system have been improved with the introduction of the 1D Wavelets as an efficient mathematical tool in dimensionalities reduction. The experimental results applied on two classical databases, the ORL and AR datasets, have showed that this proposed system has given very significant improvements at the recognition rates, superiority in comparison to well-known and classical feature extraction approaches, and a good effectiveness against deferent external factors as occlusion, illumination variation and noise. This paper is organized as follows: in the next section, we describe the classical LBP, histogram features and the proposed descriptor 1DLBP. In Section 3, the proposed algorithm of feature extraction for face recognition is presented. For this purpose, chi-square distance is required to measure similarities between face templates. In Section 4, we present our experimental results by applying the proposed algorithm on ORL and AR databases. Finally, a conclusion related to this work is given in Section 5. II. LOCAL TEXTURAL DESCRIPTORS Texture based feature extraction methods play an important role in the fields of computer vision and image processing. Several algorithms of textural features extraction have been proposed during the past years, which can be divided mainly into statistical approaches and structural approaches. More recently, the local texture descriptors have received considerable attention and have been used in several applications such as texture classification, image retrieval or in object recognition. They are distinctive, robust to occlusion, illumination variation, weak lighting, and do not require segmentation. The function of the local descriptor is to convert the pixel-level information into a useful form, which captures the most important contents but is insensitive to irrelevant aspects caused by varying environment. In contrast to global descriptors which compute features directly from the entire image, local descriptors, which have proved to be more effective in real world conditions, represent the features in small local image patches. In the following subsections, we separately discuss the description and the implementation of the LBP and 1DLBP in details. A. Local Binary Patterns (LBP) The LBP texture analysis operator, introduced by Ojala et al. [13], is defined as a gray-scale invariant texture measure, derived from a general definition of texture in a local neighborhood. It is a powerful mean of texture description and among its properties in real-world applications; we note its discriminative power, computational simplicity and tolerance against monotonic gray-scale changes. The original LBP operator forms labels for the image pixels by thresholding the 33 neighborhood of each pixel with the center value and considering the result as a binary number. The histogram of these different labels can then be used as a texture descriptor for further analysis. This process is illustrated in Fig Threshold Binary code: LBP code: 211 Fig. 1 Calculation of the original LBP operator. The LBP operator has been extended to use neighborhoods of different sizes. Using a circular neighborhood and bilinearly interpolating values at non-integer pixel coordinates allow any radius and number of pixels in the neighborhood. Each LBP label (or code) can be regarded as a micro-texton. Local primitives which are codified by these labels include different types of curved edges, spots, flat areas etc. The notation, is generally used for pixel neighborhoods to refer to sampling points on a circle of radius as shown in Fig. 2. Examples of LBPs applications with different mask are shown in Fig. 3. The calculation of the LBP codes can be easily done in a single scan through the image. The value of the LBP code of a pixel, is given by:,,, !% * 0;!"#$%!&$#'": ) 0 -./$01!"$; IT4OD 2014

219 Where and, are respectively the values of the central element and its 2D neighbors. Fig. 2 Neighborhood set for different,. by the sum of the resulting vector. This can be recapitulated as follows: %6&7.!-&!"#$%!&$#'" 2 Where 8 and 5 are respectively the values of the central element and its 1D neighbors.. The index n increases from the left to the right in the 1D string as shown in Figure 4.c.The 1DLBP descriptor is defined by the histogram of the 1D patterns. Original Image LBP (8,1) LBP (8,2) LBP (16,2) Fig. 3 Examples of LBPs application with different masks. The occurrences of the LBP codes in the image can be collected into a histogram. The classification can then be performed by computing histogram similarities. For an efficient representation, facial images are first divided into several local regions from which LBP histograms are extracted and then concatenated into an enhanced feature histogram for classification [14]. In Ref. [3],, psychological and neuroscience studies have showed that the human visual system combines between local and global features to recognize and differentiate between peoples. In the other hand, the extend versions of LBP operators present a good results by capturing the local patterns and the micro features of the human face,, but they are not performed for capturing the global patterns that can be considered as dominants structures in the image, which is in contradictory to the theory of recognition demonstrated in neuroscience and psychological sciences. B. One Dimensinnal Local Binary Patterns (1DLBP) The Projected Local Binary Pattern in One dimensional space (1DLBP) was introduced for the first time by L. Houam et al. [15, 16]; it has been combined with wavelets to classify X-ray bone images for bone disease diagnosis. The concept of the 1DLBP method consists in a binary y code describing the local agitation of a segment in 1D signal. It is calculated by thresholding of the neighborhood values with the central value. All neighbors get the value 1 if they are greater or equal to the current element and 0 otherwise. Then, each element of the resulting vector is multiplied by a weight according to its position (see Fig. 4.c). Finally,, the current element is replaced Fig. 4 Example of 1DLBP Application. III. PROPOSED APPROACH The proposed biometric system requires two phases of operations. The first phase is called the training which consists of recording faces features from each individual in order to create his own biometric template; this latter is stored in the database. The e second phase is the test, which consists in recording the same features and compares them to the biometric templates stored in the database. If the recorded data match a biometric template from the database; the individual in such a case is considered identified. The proposed algorithm used to extract information for face description and recognition is described in the following recapitulation: a. Preprocessing. b. Multi-block decomposition of the image. c. Projection of each decomposed block in one dimensional space. 213 IT4OD 2014

Convergence of Social, Mobile and Cloud: 7 Steps to Ensure Success

Convergence of Social, Mobile and Cloud: 7 Steps to Ensure Success Convergence of Social, Mobile and Cloud: 7 Steps to Ensure Success June, 2013 Contents Executive Overview...4 Business Innovation & Transformation...5 Roadmap for Social, Mobile and Cloud Solutions...7

More information

Google Apps as an Alternative to Microsoft Office in a Multinational Company

Google Apps as an Alternative to Microsoft Office in a Multinational Company Google Apps as an Alternative to Microsoft Office in a Multinational Company The GAPS Project Thesis presented in order to obtain the Bachelor s degree HES by: Luc BOURQUIN Supervisor: Thierry CEILLIER,

More information

Impact of Mobile Technologies on Enterprises: Strategies, Success Factors, Recommendations

Impact of Mobile Technologies on Enterprises: Strategies, Success Factors, Recommendations Reports & Publications Impact of Mobile Technologies on Enterprises: Strategies, Success Factors, Recommendations A study by Stefan Stieglitz and Tobias Brockmann published by the Vodafone Institute for

More information



More information

Business plan for the mobile application 'Whizzbit'

Business plan for the mobile application 'Whizzbit' Business plan for the mobile application 'Whizzbit' Tom Leleu Promotoren: prof. ir. Ludo Theunissen, dhr. Pascal Vande Velde Masterproef ingediend tot het behalen van de academische graad van Master in

More information

Business Intelligence for Small Enterprises

Business Intelligence for Small Enterprises THE ROYAL INSTITUTE OF TECHNOLOGY Business Intelligence for Small Enterprises An Open Source Approach Rustam Aliyev May 2008 Master thesis at the Department of Computer and Systems Sciences at the Stockholm

More information Coordination Action on Digital Library Interoperability, Best Practices and Modelling Foundations Coordination Action on Digital Library Interoperability, Best Practices and Modelling Foundations Coordination Action on Digital Library Interoperability, Best Practices and Modelling Foundations Funded under the Seventh Framework Programme, ICT Programme Cultural Heritage and Technology Enhanced

More information

MDA Journal A BPT COLUMN. David S. Frankel. January 2004. Until February. David Frankel

MDA Journal A BPT COLUMN. David S. Frankel. January 2004. Until February. David Frankel MDA Journal MDA Journal January 2004 Over the past year, Microsoft has given indications that it takes model-driven approaches to software seriously. Statements emanated from the top of the company about

More information

Arbeitsberichte der Hochschule für Wirtschaft FHNW Nr. 28. Enterprise Architectures for Cloud Computing

Arbeitsberichte der Hochschule für Wirtschaft FHNW Nr. 28. Enterprise Architectures for Cloud Computing Arbeitsberichte der Hochschule für Wirtschaft FHNW Nr. 28 Enterprise Architectures for Cloud Computing Laura Aureli, Arianna Pierfranceschi, Holger Wache ISSN Nr. 1662-3266 (Print) Nr. 1662-3274 (Online)

More information

Analysis of the state of the art and defining the scope

Analysis of the state of the art and defining the scope Grant Agreement N FP7-318484 Title: Authors: Editor: Reviewers: Analysis of the state of the art and defining the scope Danilo Ardagna (POLIMI), Giuliano Casale (IMPERIAL), Ciprian Craciun (IEAT), Michele

More information

What are requirements?

What are requirements? 2004 Steve Easterbrook. DRAFT PLEASE DO NOT CIRCULATE page 1 C H A P T E R 2 What are requirements? The simple question what are requirements? turns out not to have a simple answer. In this chapter we

More information

Business Intelligence Software Customers Understanding, Expectations and Needs

Business Intelligence Software Customers Understanding, Expectations and Needs Business Intelligence Software 1 Running head: BUSINESS INTELLIGENCE SOFTWARE Business Intelligence Software Customers Understanding, Expectations and Needs Adis Sabanovic Thesis for the Master s degree

More information

Digital Forensic Trends and Future

Digital Forensic Trends and Future Digital Forensic Trends and Future Farhood Norouzizadeh Dezfoli, Ali Dehghantanha, Ramlan Mahmoud, Nor Fazlida Binti Mohd Sani, Farid Daryabar Faculty of Computer Science and Information Technology University

More information

Execute This! Analyzing Unsafe and Malicious Dynamic Code Loading in Android Applications

Execute This! Analyzing Unsafe and Malicious Dynamic Code Loading in Android Applications Execute This! Analyzing Unsafe and Malicious Dynamic Code Loading in Android Applications Sebastian Poeplau, Yanick Fratantonio, Antonio Bianchi, Christopher Kruegel, Giovanni Vigna UC Santa Barbara Santa

More information



More information

Introduction to Recommender Systems Handbook

Introduction to Recommender Systems Handbook Chapter 1 Introduction to Recommender Systems Handbook Francesco Ricci, Lior Rokach and Bracha Shapira Abstract Recommender Systems (RSs) are software tools and techniques providing suggestions for items

More information

Development of a 3D tool for visualization of different software artifacts and their relationships. David Montaño Ramírez

Development of a 3D tool for visualization of different software artifacts and their relationships. David Montaño Ramírez Development of a 3D tool for visualization of different software artifacts and their relationships David Montaño Ramírez Development of a 3D tool for visualization of different software artifacts and their

More information

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited.

These materials are the copyright of John Wiley & Sons, Inc. and any dissemination, distribution, or unauthorized use is strictly prohibited. DevOps IBM Limited Edition DevOps IBM Limited Edition by Sanjeev Sharma DevOps For Dummies, IBM Limited Edition Published by John Wiley & Sons, Inc. 111 River St. Hoboken, NJ 07030-5774

More information

Creating Mobile Learning. 7 key steps to designing and developing effective mobile learning

Creating Mobile Learning. 7 key steps to designing and developing effective mobile learning Creating Mobile Learning 7 key steps to designing and developing effective mobile learning kineo Creating Mobile Learning Scoping and scheduling your mobile Step 1: 03 learning project Producing the overall

More information

SOA Development and Service Identification. A Case Study on Method Use, Context and Success Factors

SOA Development and Service Identification. A Case Study on Method Use, Context and Success Factors Frankfurt School Working Paper Series No. 189 SOA Development and Service Identification A Case Study on Method Use, Context and Success Factors by René Börner, Matthias Goeken and Fethi Rabhi April 2012

More information

An architectural blueprint for autonomic computing.

An architectural blueprint for autonomic computing. Autonomic Computing White Paper An architectural blueprint for autonomic computing. June 2005 Third Edition Page 2 Contents 1. Introduction 3 Autonomic computing 4 Self-management attributes of system

More information

Computing at School Working Group endorsed by BCS, Microsoft, Google and Intellect. March 2012

Computing at School Working Group endorsed by BCS, Microsoft, Google and Intellect. March 2012 Computing at School Working Group endorsed by BCS, Microsoft, Google and Intellect March 2012 Copyright 2012 Computing At School This work is licensed under the Creative

More information

The Essential Guide to Mobile App Testing

The Essential Guide to Mobile App Testing The Essential Guide to Mobile App Testing Tips, techniques & trends for developing, testing and launching mobile applications that delight your users A Free Book from utest The Essential Guide to Mobile

More information

Cloud-Based Software Engineering

Cloud-Based Software Engineering Cloud-Based Software Engineering PROCEEDINGS OF THE SEMINAR NO. 58312107 DR. JÜRGEN MÜNCH 5.8.2013 Professor Faculty of Science Department of Computer Science EDITORS Prof. Dr. Jürgen Münch Simo Mäkinen,

More information

4everedit Team-Based Process Documentation Management *

4everedit Team-Based Process Documentation Management * 4everedit Team-Based Process Documentation Management * Michael Meisinger Institut für Informatik Technische Universität München Boltzmannstr. 3 D-85748 Garching Andreas Rausch Fachbereich

More information



More information

A Model Curriculum for K 12 Computer Science: Curriculum. Computer. Final Report of the ACM K 12 Task Force. Committee

A Model Curriculum for K 12 Computer Science: Curriculum. Computer. Final Report of the ACM K 12 Task Force. Committee A Model Curriculum for K 12 Computer Science: Final Report of the ACM K 12 Task Force Curriculum Committee Computer Science Teachers Association Realizing its commitment to K-12 education A Model Curriculum

More information

Conference Paper Sustaining a federation of Future Internet experimental facilities

Conference Paper Sustaining a federation of Future Internet experimental facilities econstor Der Open-Access-Publikationsserver der ZBW Leibniz-Informationszentrum Wirtschaft The Open Access Publication Server of the ZBW Leibniz Information Centre for Economics Van Ooteghem,

More information


PROJECT FINAL REPORT PROJECT FINAL REPORT Grant Agreement number: 212117 Project acronym: FUTUREFARM Project title: FUTUREFARM-Integration of Farm Management Information Systems to support real-time management decisions and

More information

The Future of Mobile Enterprise Security

The Future of Mobile Enterprise Security The Future of Mobile Enterprise Security Gearing Up for Ubiquitous Computing August 2014 795 Folsom Street, 1 st Floor San Francisco, CA 94107 Tel.: 415.685.3392 Fax: 415.373.3892 Contents Introduction...

More information