Confident? about backup and recovery that put your organisation at risk This paper includes: The backup and recovery performance of 48 UK firms polled. Insight in to repeat failure rates and staffing ratios of industry leaders. Clear advice on how to eradicate weakness and over spends. Page 1
The backup and recovery performance of 48 UK firms 4sl Group conducted an online poll of 48 UK firms. The research focused on large companies; they have more than 1,000 employees, almost two thirds have at least 10TB of data to protect and an estate of more than 100 servers. A third have more than 1,000 servers. We asked them about backup performance, recovery frequency, testing processes, data growth and staffing levels. We expected firms of this size with serious resources at their disposal to be getting it right. So how do they fare on backup and recovery? Not as well as you d think. By any reasonable definition, many firms are not adequately protecting their information and are exposed to the risk of critical data loss. Almost half (48%) are not reaching the industry-accepted benchmark for backup success. In the case of a full-scale disaster, several would find themselves going back weeks or months to the most recent restore point for some of their data, leading to financial loss and possibly fines from the regulator and reputational damage. The causes are numerous and diverse: it s not all about the infrastructure, nor is it all about the software or the processes or the people, but all contribute. Behind these factors is a common thread: misconception. Misconception of what good enough means, and misconception about risk. This guide explores five key misconceptions identified by the research and looks at how they can be addressed. But first let s put some context around the challenges these organisations face. Many are experiencing the double-whammy of significant server and data growth: one in six are installing at least 20% more servers per annum, and a whopping nine out of ten are trying to cope with more than 10% data growth, with two out of ten at more than 25%. That inevitably puts a strain on backup and recovery in terms of the number and duration of backup jobs, capacity and an increasing volume of restores. 48% of firms failed to achieve the industry benchmark for backup success Page 2
that even the largest organisations fall foul of 1 If we hit 98% backup success, that s good enough (page 4) 2 It s all about the backup (page 5) 3 We don t need to do DR testing (page 6) 4 Backup policies aren t important; what matters is ensuring everything is being backed up (page 7) 5 You need about 1 member of staff per 450 servers (page 8) Whilst achieving Gartner s recommended 95% backup success rate is not a laurel you should rest on, it does give firms a good starting point. But almost half those we polled fall below it with one in ten at least 10% short of the target. How successful are your backups? 76-85% 85-95% >96% Don t know Page 3
1 If we hit 98% backup success, that s good enough 95-98% success is indeed the industry benchmark. But it s not about the percentage that s worked it s about those that haven t. An impressive backup success rate could still be hiding long-term failures that could cause your business enormous pain if the available data is days or weeks old. Get to the bottom of why backup jobs are failing: focus on those that have been failing longest, as they will cause you the most pain if you need to recover. You should be looking for continuous improvement; the chart below shows real data taken from one of our clients after 4sl took on their backup operation and started resolving their repeat failure problem. It can be hard work, involving network and platform teams and more, but its importance can t be overstated. Real data from a 4sl client Servers failing >5 days in a row 180 160 140 120 100 80 60 40 20 0 1 2 3 4 5 6 7 Month client handed operations to 4sl UNKNOWN NO A further step to bring forward the point at which you start to resolve issues, rather than waiting for failure trends to emerge, is to implement proactive monitoring. Whilst most of the firms we sampled do this already, a quarter don t. Do you monitor backups proactively? YES PLANNED Page 4
2 It s all about the backup No, it s actually all about the restore. Backing up the data is merely the means to an end getting it back when you need it is what counts. And our research found that restores are common, with less than a third of companies performing fewer than 10 per month. DON'T KNOW How many restores are you doing per month? >100 51-100 <10 10-50 Being unable to restore has many causes, but one factor that s often overlooked is the suitability of the software. The fact is that the wide range of available technologies has an equally wide range of capabilities. Take restoring databases, for example: if the database is 100GB and the software tells you it backed up 100GB, you d assume the data can be restored. But if the software didn t back the database up in a quiesce state, record processing may have been in progress and records will have been missed. The only way to tell is to check the number of records. Simpler backup technologies don t have this functionality, and you would only find out following an unsuccessful restore. To find out which technologies are suitable for business, try Forrester Research s review paper Enterprise-Class Backup & Recovery Software, which contains a detailed comparison of capability across operating systems, virtualised platforms and databases. It also looks at emerging features distinguishing the market leaders from the rest. Page 5
3 We don t need to do DR testing Shocking as it may seem, this is a common attitude. In our poll, a third of companies admitted they don t test at all, and only half (52%) were able to confirm they test at least once a year. These are worrying findings, particularly when you bear in mind that some are household names. Do you do DR testing? If so, how often? Don t test Once a year More than twice a year Yes, but don t know how often Twice a year Don t know Clearly, testing is going to reveal some hard truths if your backup operation is not running well, but does it matter if the reverse is true, your success rate is high, long-term failures are low and your ad-hoc restores take place without fuss? The proof of the pudding is in the eating. If you don t carry out a full-scale DR test, how do you actually know how quickly you could you get up and running again and how far back you can restore to? Many other variables come into play in a full-scale recovery situation, of course, most notably network, access and the suitability of recovery infrastructure. Set a Recovery Point Objective (RPO) and a Recovery Time Objective (RTO). The former establishes how recent the acceptable restore point must be, and the latter the elapsed time from the disaster to the restoration of service. They may well be different for each data tranche within your backup policies. Our staff have been involved in several real DR situations; the biggest involved the datacentre of a well-known IT services company and around 8% of the servers didn t come back online at all. Like many firms, they didn t understand their RPO or RTO and hadn t tested. Even those you d most expect to get it right often don t. Page 6
4 Backup policies aren t important; what matters is ensuring everything is being backed up An easy trap for the unwary. Why? Because a homogenous backup policy means you are storing data unnecessarily, increasing capital expenditure on storage and burdening your backup team by retaining data that is never going to be recalled. With growth in data at the levels we found, it is no surprise that many companies experience the ongoing capacity management issues and spiralling storage costs. How fast is your data growing? <10% p.a. 10-25% p.a. 26-50% p.a. Analyse your data into broad categories and define and implement policies to suit. In a perfect world you should have a service catalogue for backup & recovery of data types, driven by an explicit business requirement for retention. Most companies will never get this far though: the technologists build a solution and operate it in isolation without determining whether it s fit for purpose. A further measure to keeping costs under control is to tier your data in accordance with its criticality and age, so you spend money on storage in proportion with the need to get data back quickly. It takes effort and diligence to expire (i.e. delete) data you don t need, but it is nevertheless the correct course of action. Page 7
5 You need about 1 member of staff per 450 servers Well, maybe, but the right answer is it depends. For a small organisation, 280:1 is typical for a business-hours-only operation. At the other end of the spectrum, a well-drilled operation managing an estate of more than 1,000 servers could achieve 800:1, but a backup services vendor would be striving for over 1,000:1 (4sl s ratio across our entire client base is around 1,025:1). If your company s ratio doesn t compare well to these figures, conduct an operational review to determine what s causing the inefficiency. >50 Number of staff in backup operation 10-20 6-10 20-50 1-5 But remember, even a small firm needs a minimum of three people with the right skills to cover the service properly; that means one resource on shift, one who may be out of the office and a third providing cover. Typically they don t all need to be exclusively devoted to backup, but organisations with this model are not going to get economies of scale and may be better off outsourcing. Having a small team presents other challenges too, like the depth of technical knowledge and succession planning. On a related point, most managers cite staff budget as the cost they focus on, yet industry research has shown that over a quarter of total backup expense goes on hardware and a further 20% on software. Page 8
How can we help? Healthcheck A short engagement to review performance, assess risk and provide a clear action plan for your inhouse team or our specialists to execute. Corporate Data Retention Policy review A homogenous policy or no policy fuels unnecessary capital expenditure and drives increasingly unattainable recovery targets. 4sl Group are specialists in defining and implementing policies that are fit for purpose. Managed backup You retain your infrastructure, we run the operation, guaranteeing unrivalled service levels and providing enterprise-class reporting for a fixed monthly fee. Other services include: Solution Roadmap Design Implementation Optimisation Migration Cloud Backup Got a question for us? If you would like to get deeper answers on any of the topics covered in this paper, we would be pleased to set up a conference call or short meeting with the appropriate expert. To arrange this contact Richard Simpkins: e: Richards@4slgroup.com t: +44 (0) 207 464 4071 Backup & Recovery Specialists We manage over 20,000 servers in four continents round the clock, 365 days a year from operations centres in the UK and India. Our backup success SLA is 98.5% and we target zero repeat failures for all our clients. We regularly achieve >99.5% backup success over the course of a month. We consult on backup and recovery to FTSE100 and Fortune Global 500 companies across the range of enterprise-class technologies. Page 9
Contact Us: For further information, please contact us through our UK, Asia or US offices as below or via email at enquiries@4slgroup.com. UK +44 (0) 203 307 1030 4 Snow Hill, London EC1A 2DJ +44 (0) 1543 404 600 Watling Court, Orbital Plaza, Cannock WS11 0EL India +91 44 4282 3443 1/581 A, 2nd Floor Jeeva Complex, 200 Pallavaram Link Road, Thoraipakkam, Chennai 600 097 Singapore +65 9223 2077 20 Cecil Street, #14-01 Equity Plaza, Singapore 049705 USA 2711 Centerville Road, Suite 120, Wilmington, Delaware 19808 USA