1 Background 1.1 Healthcare organisations are increasingly generating and thus storing information in electronic format. This is an eclectic mix of Email, Microsoft Office Documents, Images as well as the textual information contained in databases. 1.2 The increasing volume and variety of this data is presenting a growing challenge to IT Departments in all organisations which have to be able to respond to the user challenge to manage capacity, backup, retrieve and relocate data as required. This is particularly onerous in a disaster recovery scenario in order to achieve recovery in the reliable, consistent and timely manner that the end user community requires. 1.3 The graph below shows the growth of user data on file servers in the Hospitals Trust. Whilst this does not include data on the many email, database and clinical servers it does give an indication of the rapid growth we have experienced over the past 5 years. Data storage requirements are doubling on a year by year basis therefore if the current trend continues by 2010 file server data alone will require 18.7Tb. (Tb = Terrabyte = 1,000 Gigabytes) this is a huge volume of data. 700.00 600.00 GB 500.00 400.00 300.00 200.00 100.00 Annual Storage Growth Total Managed Storage 0.00 2000 2001 2002 2003 2004 2005 Year 1.4 Total data storage for the county healthcare organisations, including file server, email, database and some clinical systems currently consumes 20Tb of storage across 300+ servers. Industry best practice when considering how to manage data in these volumes and maintain or improve efficiency is to use a Storage Area Network (SAN). Applying the same rule, of data storage doubling yearly, will mean we will require 640Tb of storage by 2010. Clearly, not only do we need to be intelligent in how we store and rationalise data, we also need to reduce the amount of data stored where it is not essential. 1.5 As the delivery of efficient clinical services becomes increasingly dependent on the availability of electronic data to support patient care then the demand for effective data storage and retrieval is demonstrably increasing 1.6 Additional pressures such as the need to have an effective means to recover electronic data to support growing demands in respect of Freedom of Information or being able to locate and recover data in respect of potential litigation e.g. we have already had several enquiries to recover all data in respect of a given patient. SAN Page 1
1.7 From any objective analysis, apart from specific examples such as PACS the current storage infrastructure in use within the county is inadequate in all these respects and a new approach is required to managing the electronic storage requirements of the healthcare community. 1.8 Where SAN type mass storage technology has already been used for specific solutions e.g. Hospitals Trust PACS, County Finance Systems it has proved highly effective. 1.9 The goal is to provide a general purpose solution to support the operational storage needs of the hospitals Trust, potentially the entire healthcare community, and address the strategic needs of Disaster Recovery / Business Continuity and cost effectiveness in service delivery. 2 Strategic Context 2.1 The IT strategy that is being pursued by the Countywide service is to simplify, rationalise and improve capacity of core systems and supporting infrastructure. There is an underpinning goal to remove overhead (cost) from operational support in order to transfer resources into Development i.e. value added activity. 2.2 Accordingly there are a number of aspects of the current environment that require improvement in order to deliver the IT Technical Strategy as presented to County and Trust Boards. 2.3 The key components of the technical strategy are dependent on deployment of a number of complimentary technologies / processes. Namely, 3 Issues A highly resilient, high speed local network infrastructure This is now very largely deployed in the GHT and to varying degrees in the PT/PCT environment. Modern Desktop PC environment the so called warranted environment. All organisations can take pride in the ongoing improvement and maintenance of the PC environment in preparation for NCRS and other NPfIT initiatives. Server Virtualisation. Initial deployment commenced within GHT, every expectation that it will be expanded to other organisations as the cost and operational benefits are realised. SAN Storage. A general purpose SAN solution is an essential pre-requisite to wide spread deployment of virtualisation technology in order to support the number of servers likely to be replaced. It is in this area that the community is currently weakest with no coherent storage strategy going forward, this has been highlighted with the merger of county IT organisations and the sheer number of individual storage servers to be managed. 3.1 Data is stored across many systems (300+) that all require individual backups. Storage on each system is inherently fixed and therefore effective management of disk utilisation is a key to ensuring service availability. As use of electronic data grows and new requirements are identified the storage requirement and operational overhead will continue to grow. 3.2 Internal Drivers for SAN Technology 300+ servers all with own storage makes management and backups a complex and time consuming task. Data volumes and thus backup times are ever increasing and SAN Page 2
it is increasingly problematic to complete them on all 300 servers in the available overnight window. When storage runs out on one server additional space must be procured. A SAN enables storage to be dynamically allocated based on need and thus reduces the operational overhead in terms of monitoring, purchasing and installing disk space. It minimizes the overall cost of storage because of the efficient utilisation. When done individually on many servers there tends to be a greater safety factor on each server due to the lead time that would be required to address any shortfall rather than a central pool of available storage which can be allocated as required. Part of the proposed solution is to implement a virtual tape library which will extend the backup window to 20+ hours rather than just overnight, it will also reduce the number of disparate systems that need individual licensing, backing up and management. As the need for additional computing capacity / IT services increase then the number of servers will also increase without an alternative to the traditional approach this will continue to exacerbate the operational burden where there is a linear requirement for additional staff. i.e. More servers means more staff are required for support. SAN storage will have the opposite impact reducing the operation support overhead and therefore releasing enough resource to continue to manage the growing service and avoid future recruitment costs. From the recent storage audit 20 critical servers are at or near capacity (100% utilised) therefore they will need to be replaced with new higher capacity servers in the coming year. SAN technology facilitates extending the life of such servers as they will only be needed for processing power and the storage problem will be resolved by the centralised SAN disk space. A server costs on average around 5k, thus avoiding replacement costs of these servers of up to 100k. Virtualisation will reduce the number of physical servers but it requires SAN storage to store virtual machines on. The expansion of the virtualisation project (past the initial 20 machines) will not be able to progress without significant additional storage 3.3 We are increasingly seeing projects with an inherent need for significant storage requirements, purchasing these on a piecemeal basis is neither cost nor resource effective. Some current examples include:- CRS Integration project is being implemented by an outside company (project lead Tony Dennis) who have produced a plan that assumes a resilient SAN environment will be available. This project will not move forward without a SAN unless expensive i.e. high operational overhead, disk solutions are procured. CRS Data Migration project requires 2Tb (TB = Terabyte = 1000 Gigabytes) of storage for a SQL Server, again this will be via an expensive high operational overhead disk solution unless we implement the technically and organisationally better option of a SAN. 3.4 External Drivers for San Technology Disaster Recovery and Business Continuity. Internal and external Auditors have been critical of the healthcare community s commitment and capacity to address these issues, in light of the previously expressed and growing demand for IT SAN Page 3
services Audit regards DR and Business Continuity as the major issue to be addressed within the IT infrastructure. Increasing legislation in the areas of Freedom of Information and Data Protection is increasing the pressure on being able to effectively manage electronic data. SAN technology whilst not solution to these issues in itself is a strategic tool to assist. 3.5 The IT Department have made many point improvements in addressing technical DR capability and these improvements have been demonstrated as effective on a number of occasions in the past 12 months. However we have not addressed the organisational needs for Business Continuity in a structured fashion. 3.6 Accordingly we remain vulnerable to major disaster such as the loss of a key facility or a key server or servers. With our traditional computing model, re-creation and restoration of a system requires duplicate hardware and a lengthy restore of the backup systems in the event of system failure. 3.7 This was amply and painfully demonstrated to large group of system users following the major disk failure experienced on the County Finance Systems some 2 years ago. 3.8 A SAN, particularly in combination with virtualisation technology, means that a given machine can be rapidly recreated with storage on line with the need for exact duplication of hardware or extensive manual re-configuration of the recovered data. 3.9 Ultimately in order to support the business continuity requirements of the various healthcare organisations in all but Armageddon scenarios such as natural disaster, civil unrest or war affecting a very wide geographical area, the underpinning IT environment should be composed of resilient systems across multiple sites to ensure critical services are always available, even if sometimes with less than normal capacity. 3.10 The SAN solution will provide the functionality for replicated disk storage, so that critical systems can have their data based in dual data centres located in Cheltenham and Gloucester. This means if a site should lose connectivity to its primary centre then access can be facilitated to connect to the copy data on the alternative resilient system. 4 Benefits 4.1 Centralised storage will reduce the number of devices and backup software licences required. It will minimise the operational overhead of managing disparate storage on many systems and will make efficient effective use of the central storage resource. 4.2 Replication of the data will greatly aid the disaster recovery ability of the organisation to mitigate the risk posed by the increasing reliance on electronic stored data. 4.3 Low operational overhead a SAN will enable centralisation of data; Currently 300+ servers are individually managed, but as data is moved to the SAN the overhead of managing the disparate chunks of data will begin to go away. 4.4 Efficient use of storage and thus minimises overall cost. Currently some servers are out of space completely and need additional storage procuring, whereas other servers have a significant quantity of unused storage, but this is not available to the servers that badly need it. Moving to a centralised storage model will make more efficient use of data storage - servers that don t store significant amounts of data, or which don t grow rapidly in storage requirements will not be holding on to a large amount of unused space instead the space will be in the central pool available to systems needing it. SAN Page 4
4.5 Simpler Backups. As much of our data will reside in a single location backups will be simplified and the time available to carry out the backups will be extended by a disk cache library which will enable us to stream data to tape during the day having taken a snapshot overnight. 4.6 Key data will be replicated across multiple sites giving a significant improvement in disaster recovery ability for the county. If a site becomes unavailable then it will be possible to run the services from the second site. The proposal is for GRH and CGH to be the two sites between which data is replicated. 4.7 There is a significant environmental benefit to a SAN / Virtualisation Environment as the numbers of servers and thus the power, heat generation and cooling requirements are all substantially reduced. Power savings are conservatively estimated at 40,000 over 5 years. 5 Risks 5.1 Moving to centralised platform for all data storage or all eggs in one basket syndrome) this will be mitigated by using resilient disk configurations with auto recovery capability, replicating key data to alternate systems and backing up in several stages to disk cache and then to tape. Tapes will be stored offsite. 5.2 Speed of space consumption will be a risk particularly with the current user culture that storage is cheap and unlimited. A method of charging departments / organisations for usage above a defined level is required this will be considered by the project. Along with the need to actively manage date through its life cycle from creation to deletion. 5.3 Network capacity between the resilient sites is likely to be an issue as data volumes grow. This risk is mitigated by the fact that the problem exists even if current storage technologies are continued with the desire to share data across county organisations. Additionally the cost trend for such links is one of a sustained and continuing reduction in costs for providing wide area connectivity. 5.4 Network capacity to access to central storage may be limited from some sites in the county. This risk will be mitigated by the fact that the major conurbations of users have good connectivity supplemented by the ongoing investment across the community to improve connectivity for the implementation of national applications such as NCRS, PACS etc and local initiatives such as P2P. SAN Page 5
Estimated Cost Comparison Without / With San over 5 years The cost comparison assumes replacement of 20 servers per year (total 100 at a cost of 5,400 each), and expansion of storage to 50 terrabytes. These assumptions are considered conservative on recent experience. Figures are exclusive of VAT and no estimate has been made for expected savings in LAN capacity and staff costs within IT. Capital Costs Without SAN With San Storage Costs HP San Environment including 5 years Support 493,424.38 Additional Storage (to 50 TB) 41,080.00 50TB of traditional storage 333,000.00 Tape Library 18,000.00 Server Costs Replacement Servers 540,000.00 Additional Blades 98,635.00 Additional VMWare Licences 63,556.00 Traditional Connectivity 100,000.00 Virtual Connectivity (Included) 0.00 Total Capital Costs 991,000.00 696,695.38 Revenue Costs Existing Storage Maintenance Contracts (No longer required) EMC Clarion 1 31,968.00 EMC Clarion 2 31,968.00 IBM 39,960.00 Support, New SAN (Included) 0.00 Power Cooling Costs per year / 40,000.00 6,000.00 Total Revenue Costs 143,896.00 6,000.00 Total Costs 1,134,896.00 702,695.38 Capital Savings 294,304.62 Revenue Savings 137,896.00 Total Savings 432,200.62 SAN Page 6