Getting Data from Here to There Much Faster A White Paper Abstract Today s IT environments have spent and continue to spend substantial amounts of time and money creating high performance computing, networking and storage environments. One item that is considered a necessary evil because of its high cost is bandwidth. Furthermore, bandwidth is a significant proportion of operational expense. A whole industry segment has grown up to address bandwidth, especially in WAN environments. What we propose is a different idea - true WAN Optimization versus WAN usage Optimization, i.e. getting the data from Here to There faster!
Introduction Reading the literature on Transmission Control Protocol/Internet Protocol (TCP/IP) acceleration can quickly give you a severe headache, and not without reason: no amount of Tylenol or Aleve helps one navigate the technical jargon and marketing hype. Making sense of it all is daunting, and applying whatever sense you make of it is even more difficult. In the past few years, an entirely new sub-industry has arisen from the need to get data from Here to There faster. Companies unknown just a short time ago now grace a Gartner Magic Quadrant, the accepted standard not only for the companies that occupy top positions, but additionally validating the technologies underlying the successful companies products. Generally, if Gartner has a Magic Quadrant for a particular technology and your company ranks highly within it, you re doing something right. Here, we re talking about Wide Area Network Optimization, or the shorter version -- WAN Optimization. Riverbed, Silver Peak, Blue Coat, and many others, have staked out their positions in this market, and they have been successful, even wildly so. Given that, you might question why BitSpeed would enter this already crowded market, offering up yet another product when the field already has several established leaders. To answer this question, one has to examine just what these market leaders do, why the BitSpeed approach is different, and how that difference will satisfy the needs of many potential users for whom the market leaders offer only partial solutions. WAN Optimization The WAN Itself, or Its Use? There are many big-name WAN Optimization players, most offering appliances that act as proxy-servers to network traffic. These appliances (hardware or virtual) sit between your servers and the WAN, prioritizing packets based upon predefined criteria. The primary purpose is to raise the priority of data packets used in interactive remote applications, providing users of those applications (Microsoft Exchange, SharePoint, and SAP are examples) with better response times at the expense of other users and applications. Some of these solutions offer caching techniques to help speed transactions along. These functions, all wrapped together, are generally termed WAN acceleration. When transferring large files, many of these products also do some sorts of data reduction: compression, deduplication, and delta-differencing. But as for actually optimizing the WAN, they're mostly limited to increasing the size of the TCP window to accelerate transmission. Is that enough? The short answer is No! The only case that can be made is that these offerings optimize the USE of the WAN in some circumstances; they don't optimize the WAN itself. And that s a difference well worth noting and remembering. For environments with lots of fresh data every day (new transactions), compression may provide benefit, but deduplication is of little or no value, and it's easy to understand why delta-differencing (looking for groups of bytes that are already present in the target file which of course assumes that the file, or at least some version of the file, is already at the destination) is meritless in these instances. For video files, whether intended for pre-production editing or postproduction distribution, compression is useless because the files are already in a highly compressed form; deltadifferencing may negatively impact the purpose of transferring the data file, and the performance implications are serious. Deduplication creates similar issues. The impact of choosing the best way to move data from one point to another can be huge.
Focus on the Network First, it is necessary to examine some elementary physics. Moving data from one place to another is governed by two immutable physical principals; the speed with which data travels across any communication facility is a function of distance and available bandwidth. Every organization has some basic bandwidth requirement, and you can always buy additional bandwidth, but that s not optimization. Rather, buying more bandwidth might simply be a very expensive approach that may not actually improve anything at all. Let s explore why this is so. Distance that is, the distance between Here and There, obviously varies with every organization. You might have locations on the East coast (say, New York) and the West coast (perhaps Los Angeles). You might also need to send data to Europe, South America, Asia, South Africa, India you get the idea. And the distance from Here to There is different for each, but the principles and their implications are the same. Data bits travel at (very roughly) 120,000 miles (or 192,000 kilometers) per second, or 120 miles (192 km) per ms (millisecond) over a communications facility. Bandwidth determines how many bits can travel over the connection simultaneously, but each bit travels at a fixed and unchangeable speed. The time delay in moving any particular group of bits from Here to There is called latency. It is very important to keep in mind that bandwidth and latency are independent of one another. You can increase the bandwidth of a data communications facility, but the latency between any two locations will remain approximately constant. And unless you are extraordinarily lucky, your data bits don t go directly from Here to There. After the data zigzags around a little going through numerous hubs and routers along the way (aptly called hops ), additional latency is introduced. The average latency for a data communications link between New York and Los Angeles is about 80ms; between Los Angeles and Mumbai, India, latency can be as much as 300ms; satellite connection latency is over 500ms. One very simple truth emerges from this very brief overview of data communications basics: assuming the same bandwidth, as the distance between Here to There increases, the amount of data that can be sent in any given period of time decreases. So if you re sending data over your local area network (LAN), with latencies of 5ms or less, you re problems are probably limited to internal LAN congestion. Data can really move at close to bandwidth capacity in most cases. But if your Here to There involves sending data coast-to-coast, or across oceans and distant continents, you ve got a serious problem. Please look carefully at the following charts, which show the time by each of 4 different transfer protocols taken to transfer 10 GB (gigabytes) of data from Los Angeles (Here) to various destinations with various latencies (There) using both an OC-3 (156 Mb/sec) line and a Gb (1000 Mb/sec) line:
OC-3: 155.52 Mbits/second (19.44 MBytes/second) SCP RSYNC FTP Los Angeles to: NEW YORK TOKYO ROME SYDNEY VELOCITY 0 5 10 15 20 25 30 35 40 45 Time to Complete a 10 Gigabyte File Transfer in Minutes Chart 1: OC-3 File Transfer, Compression Disabled Notice as the distance gets longer, traditional transfer protocols take longer to transfer the data. With Velocity, the distance and latency are irrelevant to the performance of the file transfer.
Gigabit Ethernet: 1000 Mbits/second (122.07 MBytes/second) SCP RSYNC Los Angeles to: New York Tokyo FTP Rome Sydney Velocity 0 5 10 15 20 25 30 35 40 45 Time to Complete a 10 Gigabyte File Transfer in Minutes Chart 2: Gigabit Ethernet File Transfer, Compression Disabled The difference is even more dramatic as the bandwidth increases (a Gb line has about 6 times the bandwidth of an OC-3 line). Again, for Velocity, the distance and latency are irrelevant to the performance of the file transfer.
Looking at these two charts together, it is apparent that an OC-3 user who adds additional bandwidth such as Gigabit Ethernet would in most cases be sorely disappointed with the results. Using SCP or RSYNC, the results on each chart are nearly identical, even though tens of thousands of dollars of additional monthly expense has added about 5.5 times more bandwidth. FTP performs somewhat better, but only Velocity is able to take full advantage of and deliver total bandwidth utilization, independent of distance and the concomitant latency. Note that Velocity on an OC-3 connection (rated at only 155 Mbps) performs about equal to or much better than FTP on a Gigabit connection (rated 6 times faster at 1000 Mbps)! All of which brings us back to the WAN Optimization concept. How do the leaders in WAN Optimization solve the problem of getting data from Here to There faster? The answer is surprisingly simple, though the implementation is not. Rather than optimize the data transmission network, the WAN Optimization leaders cleverly succeed by reducing the amount of data to be transmitted. And while this has been done to some extent for years by utilizing various data compression technologies, the WAN Optimization leaders use newer and sometimes quite effective techniques of data reduction, the most notable of which are data deduplication and delta-differencing. Data deduplication (dedupe) appeared on the commercial scene about five years ago, and has since been developed and enhanced significantly. Dedupe attempts to find multiple instances of data and files, stores only one of the identical files
or chunks (parts of files) and substitutes a small representation of the data in the form of a short (usually between 128 and 256 bits) hash value derived from a computation of well-known algorithms that have very high levels of reliability of collision avoidance (that is, extremely low probability that any single computed hash value could represent data that actually is different). In addition, there have been enhancements made to delta-differencing, which like deduplication, transfers only those files or parts of files that have changed since the last data transfer. The combination of dedupe and delta-differencing provides a powerful combination to reduce the amount of data that needs to be transferred to remote locations on a regular basis, thereby improving WAN performance. But it is important to note that these data reduction techniques, while effective in some cases by reducing the amount of data that transits the network, do not really optimize the network itself, and in many environments, as noted above, have no positive effect at all. And, both of these technologies require substantial infrastructure, powerful processors (usually provided in specially designed appliances ) with high capacity disk storage required to maintain the databases of hash values and singleinstance data, and proxy server-like connections to the WAN communications links. Such an infrastructure can become very expensive. What if your data cannot be compressed? What if your data is unique in every business cycle and doesn t contain files or chunks of data that have been previously stored? In these cases, data deduplication and delta differencing will have no effect. Many enterprise environments come immediately to mind: stock exchanges and credit card processors with billions of unique transactions every day; audio-visual producers with huge files that have already been compressed and are impossible to dedupe; medical imaging; research laboratories with immense quantities of raw data to be shared with distant colleagues; the list goes on and on. These are the markets that BitSpeed targets for its Velocity software. Rather than concentrate on data reduction (though Velocity can do that, too), BitSpeed has applied an array of innovative techniques and new technologies to vastly improve the performance of data transmission over long distances. Velocity actually transmits data at or near full bandwidth capacity regardless of distance. Coast-to coast transmissions over the WAN occur as quickly as if the data had been sent over your LAN. Very long distances no longer are a deterrent from getting data from Here to There. Replication of data across and between continents is now a practical and timely possibility. You may even be able to reduce your bandwidth expenditures because Velocity can get the job done on lower cost data transmission facilities, presenting the possibility of an almost instant return on investment. BitSpeed Velocity requires no new hardware or other infrastructure. Velocity requires no TCP tuning or changes to your operating environment. All of the optimization is done within the software. Versions are available for all major platforms, and all versions are interoperable. Velocity can be initiated locally or remotely over secured communications links. Data transfer progress can easily be tracked on the initiator s local display. Data transfers interrupted due to network issues or system problems can be resumed with no data loss and no retransmission. Velocity is bandwidth friendly and easily shares the data communications link with other processes. And Velocity is virtualization agnostic, easily able to operate within VMware, Xen, Hyper-V, or VirtualBox environments. Velocity will also comfortably co-exist with existing WAN Optimization products to solve the data transmission problems that the existing technologies cannot.
Summary It will hopefully now be clear that there is a huge difference between optimizing applications and simple WAN Optimization, and truly making full use of bandwidth and latency suppression techniques to dramatically reduce transfer times. WAN Optimization (as offered by the current industry leaders) and Application Acceleration are good starts but are not complete solutions to the problem introduced by latency and its negative effect on total bandwidth utilization. In addition, bandwidth is an expensive and significant proportion of operational expense. Why not use technology that not only maximizes current bandwidth, but performs so well that additional bandwidth is not needed, thereby saving money. For maximum performance, it is important to balance performance of the network, the servers and the storage to best optimize transfers. For more information, please contact us at: BitSpeed LLC 1601 N. Sepulveda Blvd. Manhattan Beach, CA 90266 USA Phone +1 562.735.0660 www.bitspeed.com, info@bitspeed.com