Challenges to Data Center Storage and Networking Produced by SearchDataCenter.com Presenter: Greg Schulz Sponsored by Copyright 2008 Greg Schulz. All Rights Reserved. Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowed under the copyright laws. Design Copyright 2008 TechTarget. All Rights Reserved. Dell_06_2008_0003PT
Challenges to Data Center Storage and Networking This document is based on a Dell Equalogic/TechTarget webcast entitled Challenges to Data Center Storage and Networking Mark: Please welcome Greg Schulz, founder and senior consulting analyst for the StorageIO Group, and author of Resilient Storage Networks from Elsevier. Greg Schulz: Thanks, Mark. We will look at some trends and issues today. With that being said we are essentially going to open the fire hose and I am going to hit you with a lot of material real quick. More data is being generated and stored for longer periods of time, and that is resulting in what is called the server capacity performance I/O gap. It is nothing new. It has been around for 30 or 40 plus years, but more recently with algorithmic growth of processing performance in storage capacity, the I/O is not keeping up so it is causing some bandwidth problems, okay. If you want to know more about that go to my website www.storageio.com. There is a free paper called "Data Center I/O Performance Issues and Impacts." So, what is happening here? We have different threat challenges. Certainly there are issues that make the headline news, such as fires in California and floods in other parts of the country. But then there are those that do not make the headline news, the things that may occur on a more regular basis: a file gets deleted, a file gets corrupted, a file gets infected, a file goes bad causing a database to go off-line. Things 1
that happen on a more regular basis but do not make the local, national, or international news. Some of these are acts of man, some are acts of nature, some are just errors. We have this aging infrastructure all around us bridges, roadways, basic core infrastructure, the national power generating transmission infrastructure and even issues involving the data center itself. Does anybody have excess power capacity in your data center? Pretty rare these days. Excess cooling capacity or floor space? There is more and more constraint. So, these are all issues that involve the aging of the infrastructure, within IT, but all around us. Then there is compliance. We are here in the Chicago area. Does anybody not believe that the Chicago area is subject to compliance? Well, here is an interesting thing. I get this point regularly where folks like yourselves tell me that I am not a financial firm, I am not a medical firm, I am not a payment credit card industry type firm so I am not under compliance. But what they do not realize is that there are other aspects to compliance. There are something like 8,000 different versions of compliance regulations besides those that make the headlines: besides Sarbanes-Oxley, HIPAA, CFR, PCI, and the others. So there are a lot of different ones. There is one notably called FRCP, Federal Rules of Civil Procedure. What the heck does this have to do with storage? It is really simple: if you are in the financial industry you are under probably a CFR or Sarbanes, if you are in the medical you are under HIPAA, but if you are in engineering or manufacturing, you may not be under any of those so you think hmm, no worries, I do not have to retain any data. FRCP is a law that went into effect December of last year that is a federal rule for civil procedure that different courts around the country can use as a guideline. It basically says if you have to go to court, whether you are a one-person company or a multi-billion dollar, 100,000+ employee international corporation, if requested you have to produce information. The rule does not say you have to store data. It does not say how long you have to keep data. It does not say how you have to keep the data. It just says if you ever get summons and if you have to go to court and if requested, you have to produce this information. However, as a new law, you are allowed essentially one mulligan, a one-time excuse that says I did not save my data. It has a very far-reaching impact and a lot of loopholes. Lawyers are still grappling with what it means. What it means is that organizations that may think you are outside of the scope of compliance may want to revisit that. This is something you may have to discuss with your records people and your corporate counselors Does this mean maybe even though we are not under CFR or Sarbanes or HIPAA, maybe we should be saving some things? So, compliance: very far reaching. Another trend is consolidation data centers, servers, storage networks for higher utilization, for saving cost, for doing more with the available footprint, such as remote office or branch office consolidation. Again, avoid having data all over the place. There is an interesting catch though. If you have remote offices and branch offices, there is a desire to pull all that data and bring it back to the main data center. Wait a minute. We have a challenge here. I have all this storage out of these remote offices or branch offices in these servers and I want to bring them back, get control. Wait a minute, I want to have floor space. I do not have power. I do not have cooling, so if I bring this stuff back and put it in the data center I have exacerbated a corrected problem. The approach there is to get control of what is out there, manage it first, and then start moving the data back when you can actually consolidate and support it. Virtualization is playing a role: virtualization in many different guises. We will talk about that in just a minute, as well as clustering. Virtualization is really focused right now around consolidating, pulling things together, driving up utilization. Clustering is actually going the other way, which is scaling beyond what a single device can do, beyond a single service, beyond a single storage device, or scale for performance I/O per second, bandwidth, capacity, connectivity. Challenges to Data Center Storage and Networking 2
This has all led to the proliferation of Windows and NAS file servers. How many people are running network-attached storage or NAS? Okay, keep your hands up. This is always interesting but of those that did not put your hands up, how many are running Windows-based file sharing? Usually I see a couple of more hands go up. Now you can put your hands down. What is interesting about this is that I talk with people regularly where there is this perception is that NAS equals network-attached storage, which equals network appliance, a particular vendor, a particular product. Yet there is such a wide proliferation of Windows-based file sharing out there because it is relatively easy to deploy it. Consequently, it proliferates just the way PCs proliferated a couple of decades ago. So that proliferation is now forcing management to be involved, to start managing, make sure the data is protected and backed up and arrays consolidated. Right now with server virtualization, we are in this phase again. I want to say the word again because this is cyclical, we have been through this many times over the past 30- or 40-some years, depending how long you have been in the industry. I have been in the industry more that 20 but less than 30, somewhere in between I will let you guess. But I have seen it a couple of times where we consolidate, distribute, consolidate, distribute, and we are right now back in that phase of pulling everything back in: drive up utilization, reduce cost, maximize what you have until all of a sudden we see that nice bright light and we think it is the light at the end of a tunnel. It is the headlight of the performance train coming right out of this. So, what do we do? Start spreading things out again. It is cyclical again. So, right now we are trying to consolidate things. More data is being copied, generated, and retained for longer periods of time. There is a lot of confusion. So, how many people here are currently under mandates to create a greener data center by reducing your carbon footprint, your carbon emissions, Challenges to Data Center Storage and Networking 3
things like that, anybody? A couple. How many people have to address power, cooling, or floor space? Yeah. There is a big disconnect with vendors. Vendors think green means emissions, green means save the planet. Well, it does, but the messaging of a lot of vendors is off just a little bit. They are chasing carbon footprints or emissions as opposed to the real core issue which is a limited floor space, limited power, limited cooling capability. All you have to do is just make a slight correction. There is confusion around different forms of storage virtualization. So how many people are running storage virtualization? Okay, a couple. How many people are running virtual tape libraries? Interesting. What that says is that there is this perception that storage virtualization is about LUN pooling, creating pools of storage across different vendors. That is one model. That is one implementation yet there is also the emulation, virtual tape libraries, and transparent data migration. There are the many different faces of storage virtualization. We talked about compliance. There is also confusion about what is an SMB versus a small office, a home office versus a remote or branch office, as well as connectivity options. I am not going to beleaguer you with this. I will make reference to www.greendatastorage.com. There is actually a report. It is a summary and I think it is about eight pages, reduces the 200 pages of the EPA report to Congress on the state of the energy, and the EPA is looking at power. In part they have to for the environmental concerns, but here is the interesting number that pulls it right back, which is that data centers only use 1.5% of all electricity in the United States. That is a very, very, very small sliver; by comparison, your home high definition TV sets, satellite receivers, VCRs, and DVRs consume about 1.5% put that in relative terms. All the households in the United States use about 30%. As you know in your data centers it is all about density. The amount of power that is being consumed in a given cubic foot is the issue. You can compare, for example, a refrigerator, or what a gallon of gas generates for a Chevrolet Tahoe, all kinds of fun stats. I will give you all kinds of good ones, including how much power is required for 100 TB of storage because that is going to vary. Real quickly, 50% on average goes to cooling. That other 50% is all over the board as to how much of it is for servers, how much of it is for storage, how much of it is for network. What is going to impact it include answers to questions like, are you I/O intensive? Are you performance intensive? For example, bandwidth. Are you data intensive? Just storing a lot of inactive data versus high performance transactional type data. Are you server-centric? A lot of computing versus a lot of storage. So your mileage will vary. This is essentially a sum of the averages of what I encounter talking with people like yourselves. So it is just a representative point, but it helps to help me to put some things in perspective. Challenges to Data Center Storage and Networking 4
Infrastructure resource management just real quickly. This is rather timely for this event. I talked to a lot of storage-centric events. Their view is the world of storage, but your world is probably that more of the overall data center, a cross-technology domain, meaning servers, storage, and network. The reason I want to bring this up is that more storage management tools are starting to look at not just storage networks but also at the server to be able to support VMware. To support data protection for VMware, and also to be able to, for example, diagnose problems: if there are problems with backup and data protection, to be able to determine where that problem is. Is it the tape? I mean, tape is easier to blame all the problems are tape, right? I mean if anything that goes wrong in your data center, just point to tape, but tape often is not the culprit. In fact many times with most backup data protection, the problems are not really tape. It can be the backup software that makes tape look bad. But those are tough things to change, so what do you do? Change all tape, put in a virtual tape library, and you have a little breathing room. Then some of the problems come back, so you take the virtual tape library off with disk. In a little bit more time, the problem comes back, and the problem might be elsewhere. So this just takes a look at more cross-domain functionalities that are certain to permeate across the data center. Challenges to Data Center Storage and Networking 5