Lecture 12: Network Management Architecture Prof. Shervin Shirmohammadi SITE, University of Ottawa Prof. Shervin Shirmohammadi CEG 4185 12-1 Defining Network Management Contains multiple layers: Business management: budgets, resources, agreements, etc. Service management: access bandwidth, data storage, application delivery, SLAs Network management: the entire network and its devices Element management: single router, switch, hub, etc. Prof. Shervin Shirmohammadi CEG 4185 12-2 1
Network Management Tasks Two basic functions: transport of management info, and the management of elements Tasks: Monitoring for event notification Generally events are associated with alarm triggers (security, performance, failures, etc.) Monitoring for metrics and planning Trend analysis in order to determine long term behaviours and trends (For example for your design you had to capture user data) Configuration of network parameters Setting parameter in network devices. Troubleshooting the network Determining what caused the fault. Prof. Shervin Shirmohammadi CEG 4185 12-3 Network Devices and Characteristics Network elements: Hosts, Routers, Switches, Data Service Units (DSUs), Hubs, NICs, Cable segments, etc. End to end characteristics: the characteristics that can be measured across multiple network elements Per-link and per-element characteristics: specific to the type of the element being managed. Prof. Shervin Shirmohammadi CEG 4185 12-4 2
Management Mechanisms Done through utilities (ping, tracert, ) and protocols (SNMP, CMIP, CMOT) Utilities are used in service metric instrumentation and collection Protocols allow us to retrieve, change, and transport management data across the network. Three categories of mechanisms: Monitoring mechanisms Instrumentation mechanisms Configuration mechanisms Prof. Shervin Shirmohammadi CEG 4185 12-5 Monitoring Mechanisms Monitoring: obtaining values for end-to-end, per-link and per-element characteristics. Usually collected through polling involving a management protocol, such as SNMP. Gathered data may not necessarily reflect the characteristics: that has to be extracted can calculated. Data and alarms needs to be displayed (logs, graphs, ) Design Data and events also need to be stored. considerations Can be done in multiple steps: primary, secondary, and tertiary storage. Prof. Shervin Shirmohammadi CEG 4185 12-6 3
Monitoring for Event Notification Event: something that occurs in the network that is noteworthy. Most of the time this is a problem or a failure in a network element. Threshold may be set on end-to-end or element characteristics for notification of events. This is know as real time analysis. Real time analysis usually involves short polling intervals: capacity, CPU, memory, storage needed. Traffic not insignificant! Prof. Shervin Shirmohammadi CEG 4185 12-7 Traffic Example A network has 100 routers, each with 4 interfaces, each with 8 characteristics. Polling is every 5 seconds. How much is the monitoring overhead traffic? 100 network elements x 4 interfaces / network device x 8 characteristics = 3200 characteristics. Assume Each characteristic = 8 bytes of data + 60 bytes of overhead. (why so much overhead?) Total traffic = 3200 x (8 + 60) = 217.6 KB = 1.74 Mb If we poll every 5 seconds ~ 1.74Mb / 5s = 384 Kbps. (not likely!) More likely it is a bursty rush of 1.74 Mbps every 5 seconds. Over a period of a day we have: 1.74Mbps * 720 polls per hour* 24 hours per day =30 Gb of traffic and we have 3200 * 8 * 720 * 24 = 442 MB of data are stored per day. Prof. Shervin Shirmohammadi CEG 4185 12-8 4
Monitoring for Trend Analysis Trend analysis: determines long-term network behaviour and trends. Mostly uses the same end-to-end, per-link and per-element characteristics. Helpful for planning for future network growth. Prof. Shervin Shirmohammadi CEG 4185 12-9 Instrumentation Mechanisms Instrumentation: set of utilities and tools needed to probe the network. Instruments (h/w or s/w) that do the monitoring SNMP, ping, traceroute, etc. Example, we need monitoring for the Interface MIB iftable: ifinoctets Number of bytes received ifoutoctets Number of bytes sent ifinucastpkts Number of unicast packets received ifinnucastpkts Number of mul/broas packets received ifoutnucastpkts Number of mul/broas packets sent ifinerrors Number of erroneaous packets received ifouterrors Number of packets that could not be sent plus IfOperStatus State of an interface (up, down, testing) Used for short term event monitoring, and long term trend analysis availability Instrumentation needs to gather the above for each type of device such as forwarding elements (routers, switches, ), pass-through elements (DSUs, bridges), and passive devices (RMON) must be specified. Prof. Shervin Shirmohammadi CEG 4185 12-10 5
Instrumentation Considerations Instruments need to be dependable, specially during crashes or problem situations. Many of today s networks don t have robust and dependable instrumentation. How? Physically separate management components Replicate management components Instrumentation needs to produce accurate results: E.g., taking alternate measurements of the same parameter at different points in the network should give the same answer. Prof. Shervin Shirmohammadi CEG 4185 12-11 Configuration Mechanisms Configuration: setting parameters in network devices for operations and control of the element. Can be done through: SNMP set command telnet or command line interface (CLI) HTTP CORBA FTP Prof. Shervin Shirmohammadi CEG 4185 12-12 6
Architectural Considerations The Network Management process, as part of the overall network architecture process, consists of: 1. choosing which characteristics of which end host / link / device to monitor / configure 2. Instrumenting the network devices, or adding collection devices, to collect the data 3. Processing the data for display, storage, reporting 4. display of results subset 5. storing and archiving of data subsets All aspects of network management are covered (FCAPS): Fault Management, Configuration Management, Accounting Management, Performance Management, Security Management The following must be considered in this architecture: In-band and out-of-band management Centralized, distributed, and hierarchical management Scaling network management traffic Checks and balances Managing network management data MIB selection Integration into OSS Prof. Shervin Shirmohammadi CEG 4185 12-13 In-Band and Out-of-Band (1/2) In-band: having the NM data flow over the same network that the user network traffic uses Simple network management architecture In case of network problems monitoring and troubleshooting may be difficult Out-of-band monitoring: providing different paths for NM traffic and user network traffic ISDN D-channel Separate Frame Relay/ATM virtual circuit Telephone lines Prof. Shervin Shirmohammadi CEG 4185 12-14 7
In-Band and Out-of-Band (2/2) In-band cons: troubleshooting is adversely affect if data flows are delayed or blocked, which can happen during trouble times. Event monitoring when the network is under stress, such as during congestion, can also be impacted negatively. Out-of-band cons: Extra equipment and networking resources are needed. Speed of monitoring might not be the same as the speed of the actual network (specially if costs were reduced in the installation of the management network) A separate method to check availability of the management network is needed Compromise: Hybrid approach. Prof. Shervin Shirmohammadi CEG 4185 12-15 Centralized and Distributed Management Centralized: single management system (not shown here). Distributed: multiple and separate management components of the management system Local monitoring: each component is a complete system for its local subnet Distributed monitoring: components monitor different things and exchange data among themselves for distributed decision making. Prof. Shervin Shirmohammadi CEG 4185 12-16 8
Hierarchical Management Hierarchical: monitoring, display, storage, and processing are separated and placed on separate devices. Advantages: Can substantially reduce management traffic overhead: localized monitoring devices can process and filter data, sending only relevant data Redundancy is easier and cheaper, since it s at the component level Prof. Shervin Shirmohammadi CEG 4185 12-17 Scaling of Network Management Traffic Rule 1: for a LAN, start with one monitoring device per IP subnet. For each, estimate: Number of user and network devices to be polled Average number of interfaces / device number of parameters to be collected Frequency of polling This combined rate should not be more than 10% of the capacity. For Ethernet keep this at 2-5%. Rule 2: for a WAN, start with one monitoring device per each WAN/LAN interface. This is in addition to any monitoring devices in Rule 1. Dual role devices (doing both Rules 1 and 2) are allowed. Prof. Shervin Shirmohammadi CEG 4185 12-18 9
Checks and Balances Refers to methods to duplicate measurements in order to verify and validate management data. It obviously adds overhead, but it s advisable to have more than one method of collecting management data, particularly for data considered vital to the proper operation of the network. Objective: to locate and identify: Errors in recording or presenting data Rollovers of counters, returning a counter value to zero without proper notification Changes in MIB variable from one version to the other. Example, do direct SNMP polling of a device, and double-check against RMON polling. Prof. Shervin Shirmohammadi CEG 4185 12-19 Managing Network Management Data (1/3) Flows of management data typically consists of SNMP parameter names and values, and results of queries from utilities (ping, tracert, etc.). This consists of frequent event notifications and less frequent trend analysis data. (some data are sued for both) Rule 1: Local storage vs. archival: data that is needed for quick retrieval for event analysis and short-term trend analysis should be stored locally. Others can be archived. Prof. Shervin Shirmohammadi CEG 4185 12-20 10
Managing Network Management Data (2/3) Rule 2: Selective copying of data: A dual role data (event and trend) consider copying every Nth iteration of that parameter for archival purposes, where N is small enough to allow for terns analysis yet is large enough to keep the storage size reasonable. Rule 3: Metadata: information about the data itself, such as references to data types, time stamps, and pointers. These should be stored too to make it easier for searching and indexing. Prof. Shervin Shirmohammadi CEG 4185 12-21 Managing Network Management Data (3/3) Rule 4: Data migration: trend data can still be kept on local storage until the time when traffic levels are low, such as after hours, in order to reduce network stress. Prof. Shervin Shirmohammadi CEG 4185 12-22 11
MIB Selection and OSS Integration MIB selection: selecting which MIBs to sue, as well as exactly which MIB objects to monitor and configure. E.g., full MIB II, subset of MIB II, an enterprise-specific MIB, combination, etc. Integration into OSS: how management is to be integrated with the operations support system (OSS)? Often called the northbound interface, as you go from network elements to higher-levels (see slide 12-2) Prof. Shervin Shirmohammadi CEG 4185 12-23 Exercise How much storage capacity is required for the following network management configuration? Element Management System All Devices Polled Every 15 Seconds 100% of Polled Data Stored Storage Needed for 2 Years of Continuous Polling User Devices 1500 User Devices 1 Interface per Device 6 Variables per Interface 64 Bytes per Variable Network Devices 25 Network Devices 4 Interfaces per Device 10 Variables per Interface 64 Bytes per Variable User devices generate 1500 x 1 x (6 x 64) = 576,000 bytes\per poll Network devices generate 25 x 4 x (10 x 64) = 64,000 bytes\per poll Aggregate of 640,000 bytes\poll Four polls per minute = 2,560,000 bytes\min. Annual data generated for network management polling = 1,345,536,000,000 bytes\yearly Two years work of network management data: 2.692 TB of data. Prof. Shervin Shirmohammadi CEG 4185 12-24 12