7 Appendix A ICT Disaster Recovery Plan Definition of a Disaster A computer disaster is the occurrence of any computer system or associated event which causes the interruption of business, leading in the short or long term to business losses in any form. The IT disaster plan needs to be followed in conjunction with any Corporate Business Recovery plans in place. There are many different scenarios which would constitute a disaster within the IT Service of the Council, depending on the seriousness and duration of the event. Levels of a Disaster These can be broadly broken down into severity. Level One Those which affect a small number of Users for a short period of time, such as the following: Hardware malfunction which does not preclude the majority of systems from being processed. Breakdown of part of the distributed network of PCs and terminals, for whatever reason. Unavailability of a particular application due to software malfunction Staff shortage which reduces the production schedule. All of the above would normally be expected to be corrected within a short period of time. ( 6 hours max) Level Two Those events which affect a large number of Users for a short period of time. Examples of these are as follows: Interruption of power supply Prevention of access - due to events elsewhere within the council building Major system breakdown Total network unavailability Virus Attack leading to denial of service Level Three Those events which affect a large number of Users for a long period of time. These are generally considered to be major disasters and would include the following: Prolonged denial of access for whatever reason (bomb threat etc) Hardware destroyed by fire, flood etc. Total long term loss of communications networks Criticality of Applications. The levels of a disaster as given above may be further influenced by the criticality of the application being affected. The following list of priorities is suggested, as a starting point for comparisons, depending on the requirements of the Council. Priority One Applications involving urgent payments. The main systems involved in this category would be:! Payroll! Housing Benefit payments All of these would require urgent payments to be made, regardless of whether or not a computer system was available.
8 Priority Two Applications involving other payments, cash collection or income generation or front end public service function. These applications would include the following:! Payments to Creditors/Suppliers! In-house cash receipting for all payments.! Council Tax billing Applications involving public service! Although still of major importance the delays involved for such systems may not become critical for a, providing reasonable manual procedures were implemented. These would include the following applications.! Environmental Health! Housing! Planning and Building Control! Highways (street lighting etc) An exception to this group may be the Electoral Register, depending on whether or not an election was due. Priority Three Applications for administrative and control purposes. Although highly inconvenient a loss of functionality of these systems would not cause significant loss if they were inoperable for a period of time as long as manual controls were in place. Examples of these are:! Office Automation! Accountancy It follows from the above details that all Services should be aware of the nature of the disaster and subsequent loss of ICT facilities, therefore alternative back up processes should be in place 3. Preventive Measures No one would disagree that "prevention is better than cure" but it is not always obvious what preventive measures can be taken nor indeed if the cost associated with those measures is worthwhile. As with all insurance, risk assessments have to be made and measures introduced accordingly. Risks to operational and strategic services are considered in the corporate risk assessment process. In the main the major preventive measures are already in place in this Council's IT Service. Measures already in place This Council has already installed or put into action the following: -! Restricted access to Computer Suite! Dedicated power supply?! No food or drink within the Computer Suite! Additional fire extinguishers! Automatic communication to Fire Service! Fire alarm warning connected to main alarm system! Uninterruptable power supplies for all major equipment! Improved network cabling resilience and cable management! Duplicated hardware and software systems (fault tolerant hardware and software)! Air-conditioning plant cut-out! Control of air-conditioning water supply! Fire proof safe for on-site data storage.! Comprehensive off-site data storage! Hardware maintenance contracts for all equipment! Internal security - passwords, restricted access to Users etc. Prevention against "hacking" by unauthorised users! 24hour virus checking software for both internal and external emails! Firewalls and web filtering and monitoring! Good housekeeping - general safety precautions
9 Disaster Recovery Recovery of Data Having taken all possible precautions against the occurrence of a disaster the next area that requires careful planning is in disaster recovery. As previously explained there can be many different disaster scenarios of varying degrees of seriousness but they all involve similar recovery procedures; the difference lies in the relative scale of the disaster. Recovery of Data - Software and Applications. The security of data is perhaps the most obvious area where preparations can be put into place for reasonable recovery. Most people are aware of the vulnerability of data although few individuals will do very much to provide adequate security. This is one area where corporate systems and networks can prove to be invaluable as data security is usually taken away from the responsibility of the User and is actioned by professional IT staff, using the appropriate equipment. Data Security Within the IT Service The IT Service dedicates a significant amount of operational time in taking security copies of data (software and applications) in various ways. Copies of very large databases are taken several times throughout the day and night to ensure that should a "disaster" occur then the minimum time is lost in recovering from the event. This event may be a simple re-run of a job, because a User or operator input incorrect information, or it may involve the recovery of the data after a severe corruption due to fire or other incident. In the main data security is achieved by copying from live data files, which generally reside on on-line disks, to magnetic tape media of one kind or another which can be removed from the system and stored elsewhere, in fireproof storage or as off-site security. Within the IT Service there are three levels of security copies, each of which may contain several different copies of the data: Level One This comprises those copies of data which have been most recently taken and which are kept available within the computer room. Level Two These are copies of all data taken each day and the most recently available are stored overnight within a fire-proof safe. Level Three On a weekly basis the latest copy of data is stored in an off-site storage area to ensure that alternative copies of data are available. Data security is taken from many sources, for each of the different operating platforms that we use. Replacement of Hardware. With most disaster situations one of the elements of the recovery plan will be the replacement of computer hardware which has either failed for some reason or has been destroyed by fire or some other means. The range of hardware involved is so diverse and so numerous that it is not be possible to have duplicated hardware in place therefore it is be necessary to seek other means to replace the hardware. Contracts/Agreements in place There are agreements in place with various suppliers to minimise the risk of prolonged server downtime. Our primary contract for hardware replacement is NDR. The contract has been procured through a partnership arrangement with 7 other local authorities in Kent and gives hardware replacement cover for all critical systems in TWBC. On and off site testing is included and a programme of recovery tests is planned throughout the partnership. This contract also offers a number of seats in a recovery room provided by NDR in the event that access to the Town Hall is restricted..
10 Other Service's Hardware Requirements. Depending upon the extent of the disaster it may also be necessary to obtain other distributed hardware such as PC equipment and terminals as well as other communications hardware such as modems, switches and routers. It is not considered that this would be a problem as all of this equipment is readily available in the market place and costs involved would be covered by insurance provision. Alternative Sites For all but the major disasters it is unlikely that the actual Computer Room Suite would be affected and therefore the provision of an alternative site would not be required. However, should a major disaster occur, such as a fire or flood, it is quite likely that the computer site would not be usable, at least in the short to medium term. In these instances an alternative site would be required to provide somewhere to install the repaired or replacement equipment. In the event of such a disaster or significant Denial of Access to the Town Hall, guidance should be sought from the corporate business recovery plan or technical emergency plan, to facilitate in liason with our primary disaster recovery contractor (NDR) the rapid deployment of ICT systems in alternative sites. Connection of Communication links. As all our local area networks are available anywhere on the network, and from the internet, as long as the integrity of the network is proved there should be no issues regarding access. Connection to remote links, such as Cranbrook offices, may need to be negotiated with British Telecom.
11 Actions in the event of a disaster occurring. " Identify the level and extent of the disaster - What has happened? " Where has it happened? " What people are affected/hurt etc? " What equipment is affected/destroyed? " What premises are affected and how severely? " Which computer services have been affected? " Are they priority 1,2,3 or 4 or a combination of these - What is the order of priority? " How long will these services be affected, short medium or long term? " Notify people involved with the Disaster Recovery Plan (DRP) - IT staff : who will do what? " Senior Management " Support Services? NDR, Insite etc " Equipment Suppliers? " External technical support services? Ie specific application support " Notify people involved with the Disaster - Users affected, by priority? " Customers affected, via Users? " Arrange for Users to invoke alternative/manual procedures? " Arrange for alternative site preparation - Inform suppliers of requirement? " Prepare site for portable unit? " Arrange for power supply connections? " Arrange for alternative communications - Provide for air-conditioning equipment? " Contact equipment supplier? " Inform supplier what equipment required, in priority order? " Obtain possible dates of delivery and publish to those affected? " What recovery can be made? " Is any equipment usable if moved - Can recovery work begin? " Locate recovery security copies of data and software? " Should the public be informed?
12 Disaster Recovery First Steps Identify the level and extent of the disaster - What has happened? Where has it happened? Who is affected Identify priority services Notify relevant parties regarding extent and complexity of disaster IT Staff External support services Senior managers Customers/Users Invoke Business Continuity Plan and Emergency Plan if required