Decentralised diagnosis of discrete-event systems: application to telecommunication network



Similar documents
Modeling Fault Propagation in Telecommunications Networks for Diagnosis Purposes

Diagnosis and Fault-Tolerant Control

Diagnosis of Large Software Systems Based on Colored Petri Nets

Specification and Analysis of Contracts Lecture 1 Introduction

Quality of Service in Industrial Ethernet Networks

Runtime Verification - Monitor-oriented Programming - Monitor-based Runtime Reflection

Rapid Prototyping of a Frequency Hopping Ad Hoc Network System

780 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 17, NO. 4, JULY /$ IEEE

The Model Checker SPIN

Formal Verification by Model Checking

BENEFIT OF DYNAMIC USE CASES TO EARLY DESIGN A DRIVING ASSISTANCE SYSTEM FOR PEDESTRIAN/TRUCK COLLISION AVOIDANCE

Attenuation (amplitude of the wave loses strength thereby the signal power) Refraction Reflection Shadowing Scattering Diffraction

Using ModelSim, Matlab/Simulink and NS for Simulation of Distributed Systems

Mixed-Criticality Systems Based on Time- Triggered Ethernet with Multiple Ring Topologies. University of Siegen Mohammed Abuteir, Roman Obermaisser

Wireless Sensor Network Performance Monitoring

Utility Communications FOXMAN-UN Network Management System for ABB Communication Equipment

Instructions for Setting the T560 Digital Delay Generator for the Target Delay

Division of Mathematical Sciences

Influence of Load Balancing on Quality of Real Time Data Transmission*

RESOURCE ALLOCATION FOR INTERACTIVE TRAFFIC CLASS OVER GPRS

Solvency II and Predictive Analytics in LTC and Beyond HOW U.S. COMPANIES CAN IMPROVE ERM BY USING ADVANCED

I.S. 1 remote I/O system Redundant coupling via PROFIBUS DP

Doctor of Philosophy in Computer Science

Practical Machine Learning and Data Analysis

COMMUNICATIONS SYSTEMS ANALYST

GSM Architecture Training Document

NOVEL PRIORITISED EGPRS MEDIUM ACCESS REGIME FOR REDUCED FILE TRANSFER DELAY DURING CONGESTED PERIODS

Mobile Communications TCS 455

Network Simulator: A Learning Tool for Wireless Technologies

MEDIA TECHNOLOGY & INNOVATION. General issues to be considered when planning SFNs

In-Vehicle Networking

Bosch Packaging Academy Essential Training

Kirsten Sinclair SyntheSys Systems Engineers

Hydraulic Pipeline Application Modules PSI s Tools to Support Pipeline Operation

Authors Mário Serafim Nunes IST / INESC-ID Lisbon, Portugal mario.nunes@inesc-id.pt

PROFINET IO Diagnostics 1

THE ORGANISATION. Senior Management Major end users (divisions) Information Systems Department

Frame Burst Adjusting for Transmitting Video Conference in Gigabit Ethernet

INTERNATIONAL TELECOMMUNICATION UNION $!4! #/--5.)#!4)/. /6%2 4(% 4%,%0(/.%.%47/2+

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM

Knowledge based energy management for public buildings through holistic information modeling and 3D visualization. Ing. Antonio Sacchetti TERA SRL

Disaster-Resilient Backbone and Access Networks

T Reactive Systems: Introduction and Finite State Automata

BOOLEAN CONSENSUS FOR SOCIETIES OF ROBOTS

New technologies applied to Carrier Monitoring Software Systems. Juan Carlos Sánchez Integrasys S.A.

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses

INTERNATIONAL TELECOMMUNICATION UNION

INTERNATIONAL TELECOMMUNICATION UNION. SERIES V: DATA COMMUNICATION OVER THE TELEPHONE NETWORK Interfaces and voiceband modems

Configuring PROFINET

Introduction. Chapter 1

Making Ethernet Over SONET Fit a Transport Network Operations Model

MONITORING AND DIAGNOSIS OF A MULTI-STAGE MANUFACTURING PROCESS USING BAYESIAN NETWORKS

Release: 1. CPPSEC3020A Monitor security from control room

Communication Networks. MAP-TELE 2011/12 José Ruela

Lecture 12 Transport Networks (SONET) and circuit-switched networks

FAULT MANAGEMENT SERVICE IN ATM NETWORKS USING TINA NETWORK RESOURCE ARCHITECTURE

A Passive Method for Estimating End-to-End TCP Packet Loss

Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX

RailML use in the project

Oscillations of the Sending Window in Compound TCP

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING

Secure your electrical network. PACiS The Safety Solution for Healthcare Institutions

School of Computer Science

FUNCTIONAL MODULES FOR INTERMIXED PLANNING AND EXECUTION OF AN OBSERVATION MISSION

Formal Verification of Software

USING LOCAL NETWORK AUDIT SENSORS AS DATA SOURCES FOR INTRUSION DETECTION. Integrated Information Systems Group, Ruhr University Bochum, Germany

STANDPOINT FOR QUALITY-OF-SERVICE MEASUREMENT

Interconnection Networks

Software Engineering UNIT -1 OVERVIEW

CHAPTER 18 THE PUBLIC TELEPHONE NETWORK # DEFINITIONS TERMS

Measuring and Understanding IPTV Networks

Cable Network Transparency Fiber Optic Monitoring

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

An Integrated Network Management System for SDH Broadcasting Networks

Lecture overview. History of cellular systems (1G) GSM introduction. Basic architecture of GSM system. Basic radio transmission parameters of GSM

/ transport Supervision at The Heart of Transport Network Security

Introduction to Optical Networks

Monitoring for Handover from TDD to GSM

A MODERN DISTRIBUTION MANAGEMENT SYSTEM FOR REGIONAL ELECTRICITY COMPANIES

First Midterm for ECE374 03/24/11 Solution!!

Enforcing Security Policies. Rahul Gera

Concept and Project Objectives

National Education Technology Standards

Discrete-Event Simulation

Introduction to Formal Methods. Các Phương Pháp Hình Thức Cho Phát Triển Phần Mềm

How To Understand The Differences Between A Fax And A Fax On A G3 Network

Radiological Assessment Display and Control System

Transcription:

Decentralised diagnosis of discrete-event systems: application to telecommunication network Yannick Pencolé CSL, Yannick.Pencole@anu.edu.au in collaboration with M.-O. Cordier and L. Rozé

CSL Seminar 1 / 40 Telecommunication network Set of connected components Distributed system Metropolitan/Wide Area Network Purpose: transmission of data between clients (companies) Network management: providing a good quality of services Using all the network resource with a minimal cost Traffic management, failure management

CSL Seminar 2 / 40 Network monitoring Supervision center Purpose: Identify the problems that could have occurred which explain the received alarms Alarms Failure Detection, Localisation, Identification, Propagation Failure

CSL Seminar 3 / 40 Needs for monitoring Supervised network: large scale system Very important number of received alarms per day (several thousands) Supervisor: human agents Analysis of the received alarms: complex problem, in particular if we need to determine the problems quickly An automatic system to help in the alarm interpretation is necessary Existing systems [Sloman86][Jakobson93] Expert systems, correlation alarm systems Problem: evolutive system

CSL Seminar 4 / 40 Context and purposes To propose a failure diagnosis system Taking into account the evolutivity of the supervised system Producing complete and concise diagnoses Online diagnosis approach: need of efficiency Our proposal: decentralised approach Context: MAGDA project Academic partners: IRISA, LIPN Industrial partners: Alcatel, France Telecom, Ilog

CSL Seminar 5 / 40 Diagnosis: principles Know Understand Model Diagnosis (Norms, expertises, system specification) Act (Reparation) See (Sensors) System

CSL Seminar 6 / 40 KNOW: Model Model of discrete-event systems Failure: occurrence of an event which can change the state of a component Interaction between components: message exchanges (emission/reception) Alarm: emission of an observable event by a component Behaviour of the system Nominal behaviour, Faulty behaviour Used formalism: Set of communicating automata

CSL Seminar 7 / 40 Model of component Γ i = (Q i, Exo i Rcv i, Obs i Emit i, T i ) Q i finite set of states, modes of behaviour (faulty or not) Reception events Exo i exogenous events: failures, actions from the environment Rcv i internal events: reception of messages from other components (event propagation) Emission events Emit i internal events: emission of messages to other components (event propagation) Obs i observable events: emission of alarms to the supervisor T i Q i Exo i Rcv i 2 Obs i Emit i Q i set of transitions

CSL Seminar 8 / 40 Model of component: example I21/{ O11 } 0 1 F1/{} F1/{ I12 } F2/{ O12, I12 } 3 I21/{} 2 Exo i : F1 F2, Rcv i : I21 Emit i : I12, Obs i : O11 O12

CSL Seminar 9 / 40 Model of the system Set of components: Γ {Γ 1,..., Γ n } O11,O12 O21 F3,F4 Γ I12 I21 Γ 1 2 I24 F1, F2 I23 I32,I 32 Γ 4 O41 Γ 3 I34 F5 Global behaviour obtained by synchronised product (synchronisation on internal events): Γ = n i 1 Γ i

CSL Seminar 10 / 40 SEE: Observations Set of sensors Observation channels Γi Γ j CHANNELS emission reception (t) (t ) Sensor Different propagation delays Instantaneous With a known maximum delay D Observation: reception of a message from a component by a sensor On a sensor: order of reception order of emission! 2 sensors may not have synchronised clocks!

CSL Seminar 11 / 40 Observations: partial order O: set of observations (message with a date of reception) : partial order relation on the observations based on the observability of the system Number of sensors (synchronised clocks?) Characteristics of the channels Instantaneous? FIFO? propagation delay? Sequence O11 O21 O41 O11 O41 O11 O41 O11 O21 O41 Partial order set

CSL Seminar 12 / 40 UNDERSTAND: diagnosis Purpose: to explain the observations by the occurrence of failures (permanent, intermittent) Given the model Γ, given the observations O, how to find the behaviours modelled in Γ that are compatible with O. Diagnosis Set of behaviours Sequences of events that could have occurred on the supervised system Diagnosis = Transition system

CSL Seminar 13 / 40 Centralised approaches Centralised approaches Need of the Global Model Γ = n i 1 Γ i Diagnoser approach [Sampath et al.][rozé et al.] Based on an observer: a finite-state machine which represents the set of observable behaviours from the supervised system Diagnosis information: contained in the states of the observer Advantage: the computation of the diagnosis is efficient (parsing of the observer). Drawback: computation of the diagnoser, good luck! Worst case size of Γ : 2 n (MAGDA project = small network = (n = 57)) Worst case size of Diagnoser( Γ ): 2 2n

CSL Seminar 14 / 40 Decentralised approaches Principle: Divide and conquer Divide: Computation of a set of subsystem diagnoses γ1 (O γ1 ),..., γm (O γm ) Diagnosis which explains observations O γi from a subsystem γ i = {Γ i1,..., Γ ik } by a set of subsystem behaviours Explanation based on the hypothesis the subsystem γ i is independent from the others Conquer: Merge of the subsystem diagnoses to get the global diagnosis (O) = Merge( γ1 (O γ1 ),..., γm (O γm )) Purpose: diagnosed interactions checking

CSL Seminar 15 / 40 Centralised/Decentralised Γ={Γ,...,Γ } 1 n Global Model Γ {γ,...,γ } 1 m Subsystems Global Diagnosis { (Ο ),..., (Ο )} γ1 γ1 γ m γm Merge (Ο )

CSL Seminar 16 / 40 Advantages of a decentralised approach The global model is not necessary Use of tractable models Supervised systems: distributed systems, well-suited approach More adapted to the evolution, the reconfiguration of a component of the system

CSL Seminar 17 / 40 Decentralised approaches: previous works Decentralised and coordinated diagnosers [Debouk et al.] One diagnoser by sensor: computes a local diagnosis Merge: communication protocols between diagnosers to compute the global diagnosis Problem: the local diagnosers still need the computation of Γ Diagnosis of active systems [Baroni et al.] Simulation of the decentralised model Γ constrained by the received observations Simulation by subsystems (subsystem diagnosis), and generalisation of the simulation (Merge) Disadvantage: offline method, can t be used as a monitoring system (offline diagnosis approach)

CSL Seminar 18 / 40 Diagnosis representation Diagnosis (subsystem and global) Set of behaviours Occurrence of failures and their propagations Can be represented by a communicating automaton Example: diagnosis of the subsystem γ 1 = {Γ 1 } (observation O12) I21/{ } F1 /{ } 3,{ O12 } 4,{ O12 } 0,{} F2 /{ } O12 I12 2,{ O12} F1 /{ I12 } 1,{ O12}

CSL Seminar 19 / 40 Reduced representation Diagnosis: set of transition paths the number of paths can be important essentialy due to the concurrency of the system need of a reduced representation In the reduced representation, Paths = Event traces [Mazurkiewicz 86] equivalent class of event sequences equivalence based on event independency (concurrency) Partial order reduction technique

CSL Seminar 20 / 40 Reduced representation: example If the diagnosis consists of the following sequences 1. F1/{} F3/{O31} F4/{} 2. F1/{} F4/{} F3/{O31} 3. F3/{O31} F1/{} F4/{} If we know that F1/{} and F3/{O31} independent, F3/{O31} and F4/{} independent The following path is sufficient to represent the diagnosis by successive permutations of consecutive independent events F1/{ } F3/{ O31} F4 /{ }

CSL Seminar 21 / 40 Local diagnosis computation Given a subsystem γ i {Γ i1,..., Γ ik } Given O γi the set of observations from γ i Purpose: find the set of paths from γ i j {i 1,...,i k } Γ j explaining O γi γ i has been chosen to be tractable A centralised approach can be applied on γ i Use of an adaptation of the diagnoser approach [Sampath et al.] in order to have an efficient computation Computes paths representing traces Noted red γ i (O γi ).

CSL Seminar 22 / 40 Local diagnoses and interactions Local diagnosis of γ i : Inform about the possible interactions with the neighbours of γ i Interaction: exchange of events, synchronisation I21 F3/{ I21 } I21/{ O11} γ i γ j

CSL Seminar 23 / 40 Merge operation: characteristics Purpose: check the interactions between the local diagnoses. Merge operation ( ): synchronised and reduced product on automata Merge operation recipe: 1. define a dependence relation on the events of the system 2. mix the results from [Arnold92] (synchronised product of transition system) with the sleep set algorithm from [Peled93] (partial order exploration based on the dependence relation) and you have: red γ i γ j (P γi γ j (O)) = red (P γi (O)) red (P γj (O)) γ i γ j where P γi (O) is the partial order set of observations extracted from O (projection) which have been emitted by γ i

CSL Seminar 24 / 40 Merge operation: example I21/{ } F1 /{ } 3,{ O12 } 4,{ O12 } 0,{} F2 /{ O12 I12 } 2,{ O12} γ 1 F1 /{ } 1,{ O12} γ I12/{ O31 } F3/{ } 0,{} 1,{ O31} 2,{ O31} 2 (0,0),{ } F2 /{ } O12O31 F1/{ } F3/{ } (2,1),{ O12O31 } (1,1),{ O12O31 } (1,2),{ O12O31 } Merge

CSL Seminar 25 / 40 Merge strategy is based on a product operation, it can be not efficient! we have to use it meanly, when necessary. We need a plan for the application of the merge Strategy based on the information contained in the local diagnoses What are the diagnoses to merge? The less I merge, the more efficient I am! The strategy is defined with 2 rules Incompatible path detection Selection of dependent diagnoses

CSL Seminar 26 / 40 Rule 1: Incompatible trajectory detection Let E i be the set of exchanged events (interactions) of the subsystem γ i according to its diagnosis Every event e exchanged between γ i and γ j is necessary such that: e E i E j If not, every path containing e in the diagnosis γi is incompatible Rule 1: elimination of incompatible paths before applying the operation on the local diagnoses.

CSL Seminar 27 / 40 Rule 2: Selection of diagnoses Basic idea: merging two diagnoses that do not interact each other 1. is roughly equivalent to make the Cartesian product 2. is useless, their interactions are not checked Rule 2: Only merge diagnoses which interact each other It is possible to apply the strategy in a parallel way (distributed application) The result is a set of independent diagnoses. 1. Each diagnosis gives the explanations of the observations from a part of the system 2. There is no interaction between the diagnosed parts

CSL Seminar 28 / 40 Summary of the approach Local Diagnoser Local diagnosis Model Obs Strategy + Merge Result Obs OFFLINE ONLINE Obs

CSL Seminar 29 / 40 Incremental diagnosis Observations: a continuous flow of alarms Every observation is in a temporal window W 1,..., W m Given the observations from the window W j the diagnosis explaining the observations from W 1,..., W j 1 How to efficiently compute the diagnosis explaining the observations from W 1,..., W j? Incremental diagnosis computation

CSL Seminar 30 / 40 Difficulties Generally, at the end of any temporal window, we do not have the guarantee that the received observations can be explained! Some observations might be missing O11 O21 O31 O21 O11 reception time emission time Wj O11 is not received during Wj but can be necessary to make a diagnosis The sequence O11 O21 O31 O21 may have no explanation!

CSL Seminar 31 / 40 First solution: sound temporal windows Detection of sound temporal windows An observation is emitted and received in the same window Can be detected relying on the observation channel properties Incremental diagnosis computation: 1. From the current diagnosis states at the end of W j 1 2. Computation of the global diagnosis explaining the observations from W j 3. Refinement algorithm: 1,...,j 1,...,j 1 Wj

CSL Seminar 32 / 40 General solution In the worst case, there is no sound temporal window! Need of an extended diagnosis ext W j explains the observations of W j and some hypothetical observations, emitted before the end of W j but not received yet has more explanations than the real one Incremental diagnosis computation: same algorithm as before We have the guarantee that if W j is sound ext 1...j = 1...j

CSL Seminar 33 / 40 Transpac network French packet switching network Experiments done on a sub-part of the network 8 switches, 32 control stations, 2 technical centers Diagnosis difficulty: masking phenomenon One studied scenario with 56 alarms Multiple faults diagnosis (masking phenomenon) Result obtained in 8 seconds

CSL Seminar 34 / 40 Magda project: SDH network

CSL Seminar 35 / 40 Magda project: Montrouge ADM

CSL Seminar 36 / 40 Magda project: Supervision chain (November 2001) ILOG interface Diagnosis Software Irisa Communication via Corba Alarm Sensors Alcatel Network Simulation Alcatel

CSL Seminar 37 / 40 Magda project: Interface

CSL Seminar 38 / 40 Magda project: studied scenarios Scenarios Strategy 1 Strategy 2 Strategy 3 Strategy 4 1 3s 590ms 4s 200ms 16s 540ms >5mn 2 1s 300ms 1s 300ms 1mn 52s 770ms >5mn 3 1s 780ms 1s 910ms >5mn >5mn 4 1s 600ms 2s 30ms 49s 120ms >5mn 5 2s 620ms 5s 500ms 5s 430ms 3mn 45s 600ms 6 1s 780ms 2s 320ms 24s 240ms 57s 440ms 7 1s 480ms 1s 700ms 2mn 54s 920ms >5mn 8 1s 830ms 3s 90ms 3s 30ms >5mn Eight studied scenarios Strategy 1: The previously described strategy Strategy 2: Perturbation of the order of merging Strategy 3: Same as 1 without incompatible path elimination Strategy 4: Same as 2 without incompatible path elimination

CSL Seminar 39 / 40 Conclusions What a funny challenge it was!! Main problem: the use of centralised approches impossible Large scale DES: problem of spatial complexity Framework of a decentralised diagnosis approach Divide and conquer principle Transfer of a part of spatial complexity to temporal complexity In practice, the number of behaviours explaining a set of observations is very small compared to the number of behaviours of the system Conquer Need of merging strategies, Use of diagnosis trace representatives Incremental algorithms

CSL Seminar 40 / 40 Perspectives How to take benefits from the symbolic representation techniques inside this framework? (BDDs) How about the diagnosability test of such systems? How about using diagnosability for making diagnosis abstractions? How to take into account reconfigurations of the system? How to mix with planning approaches (repairing plans)? Large scale autonomous systems