1 Best Practices for Data Sharing in a Grid Distributed SAS Environment Updated July 2010
2 B E S T P R A C T I C E D O C U M E N T Table of Contents 1 Abstract Storage performance is critical Required background knowledge Shared file system experience with SAS NAS Appliance (NFS or CIFS) Clustered and shared file system Hybrid NAS and SAN systems - iscsi SAS experience with various shared file systems Comparing various storage architectures for SAS Implementation Guidelines Initial architecture design Throughput requirements for SAS applications Location of SASWORK and Utility Space Keep it simple Design testing Test I/O throughput outside of SAS Example migrations from stand-alone to grid SAS Data Integration example Evaluation process Design approach Multi-user environment example Evaluation process Design approach Sample solution Conclusion Authors Appendix A: Storage basics and definitions
3 1 Abstract Data volumes continue to grow at unprecedented rates. Enterprise solutions must be able to access and process this data efficiently, reliably and quickly in order to produce the results necessary to make solid business decisions. This is true for many SAS customers who look for ways to distribute both their data and or their processing to help them meet ever-shrinking timelines. The performance and success of a solution depends on timely and reliable data access no matter where it resides. Customers are looking to SAS to provide guidance and recommendations on configuration of distributed environments. This paper provides an introduction to basic storage terminology and concerns. The paper describes the best practices used during successful testing with SAS and clustered file systems. This paper can be used as a reference guide when configuring a distributed environment, performing, and scaling to meet the needs of your organization. 1.1 Storage performance is critical Storage performance is the most critical component of implementing SAS in a distributed grid environment. When your storage subsystem is not designed properly your application does not perform well. The performance and architecture of the storage required to support SAS software is critical. 1.2 Required background knowledge There is basic terminology, definitions, and concepts that must be understood when building a grid distributed SAS environment. If you already have a strong background in this area, please read on. If you don t, you can find an overview of critical terms and technologies in the appendix of this document. 1.3 Shared file system experience with SAS Defined below are three types of shared file systems that SAS has tested in various environments. These types are Network Attached Storage, Clustered / Shared File Systems and Hybrid File Systems.
4 1.4 NAS Appliance (NFS or CIFS) When you purchase the Network Appliance FAS Storage device, commonly referred to as a NetApp filer, you can leverage standard Ethernet networks to share storage among storage clients using network sharing protocols like NFS or CIFS. There are slight differences between these protocols but typically the technology is the same. The advantage to this storage is that it is usually inexpensive and extremely easy to deploy. It is typically not recommended to use this type of storage for distributed SAS environments that require a large amount of I/O performance. However, as networking throughput and technology improves, the cost and ease of deploying this type of storage makes it a very attractive option. Advantages Challenges Best practices Easy to deploy and administer. Low cost shared file system option. Less scalability than SAN based CFS. Poorer performance due to packetized network traffic (but newer 10 GB Ethernets could help). Use a dedicated Ethernet interface(s) on storage clients for storage traffic. Use more than one Ethernet per storage client for availability. Can be configured for high availability. Does not support load balance across multiple Ethernet interfaces easily (extra software and hardware or manual load balancing required) Use high speed networks like 10 GB Ethernet when possible. Ensure NFS server can meet server throughput requirements of storage clients (NFS servers typically don t provide the throughput capability of standard storage devices).
5 Mounted Volume 1 Ethernet Network Computer(s) Mounted Volume 1 Volume 1: Shared File NAS Device (NetApp FAS) System Figure 1: NAS architecture diagram
6 1.5 Clustered and shared file system The most commonly implemented storage architecture for SAS software in a distributed environment is a Clustered or Shared File System (CFS). In this architecture, standard storage devices available from various hardware vendors are shared via SAN architecture (for example: connected together with a fiber channel switch). CFS is software component that is installed on all of the storage clients. Through various processes running on the storage clients, it enables the sharing of metadata and controls simultaneous access. There are many types of CFS software available from both third-party and hardware vendors. Advantages Challenges Best practices Typically the best performing shared file system architecture Typically the best performing shared file system architecture Not all products support multiple operating systems More expensive than NAS Harder to administer than NAS Requires dedicated storage network (typically fiber channel). Understand the metadata requirement for the CFS. Dedicated server and storage might be required to store metadata. Metadata server response time can be critical to overall CFS performance. Monitor throughput performance from storage client all the way to the disk volumes in the storage device.
8 1.6 Hybrid NAS and SAN systems - iscsi There are several iscsi based file systems on the market. EMC s Multi-Path File System (MPFS) is a clustered and shared file system that can run over a pure fiber SAN or be used over iscsi (MPFSi). SAS has done extensive testing with the MPFSi product depicted in figure 3. Storage clients are connected to a standard Ethernet and specialized equipment converts the network traffic (IP) over to storage protocols. These protocols are sent down to the storage devices by fiber channel. MPFS is based on EMC s previous Highroad product. Many different vendors offer iscsi storage products. Advantages Challenges Best practices Faster throughput than pure IP based storage due to use of storage protocols More complex to administer and monitor Use a dedicated Ethernet interface(s) on storage clients Strong failover and load balancing capability with multiple Ethernet interfaces More expensive than NAS solution Use more than one Ethernet per storage client for availability and performance Monitor throughput capability at all layers of the architecture Seek design help from the vendor to ensure throughput requirements can be met
9 Mounted Volume 1 Mounted Volume 1 Computer 1 Ethernet Switch Network Computer 2 Ethernet to SCSI Conversio n Shared File System Volume 1 iscsi Storage Device(s) Figure 3: Hybrid NAS and SAN architecture diagram
10 1.7 SAS experience with various shared file systems Each of these falls into one of three categories: Hybrid, NAS, or CFS. Following this table is an overview of each category including some advantages, challenges, and best practices for implementing them with SAS in a distributed or grid environment. Operating system used in tests at SAS* Red Hat Linux (RHEL 4) File Sharing Technology EMC Celerra Multi-Path File System on iscsi (MPFSi) Category Hybrid Red Hat Linux (RHEL 4) Network Appliance (NFS) NAS Sun Solaris 10 Sun StorageTek QFS CFS Red Hat Linux (RHEL 4) Global File System (GFS) CFS Microsoft Windows HP Polyserve Matrix CFS Red Hat Linux (RHEL 4) HP Polyserve Matrix CFS IBM AIX IBM Global Parallel File System (GPFS) CFS HP-UX Veritas Clustered File System (CFS) CFS Microsoft Windows IBM Global Parallel File System (GPFS) CFS Microsoft Windows Microsoft CIFS NAS * Testing level and experience of each file sharing technology varied.
11 1.8 Comparing various storage architectures for SAS Listed in the table below are the primary storage types along with some of the advantages and challenges of each. Highlighted in bold in the table are some of the most important points to consider when choosing a storage option. Storage type Advantages Challenges Predictable performance. Direct Attached Storage (DAS) Network Attached Storage (NAS) Data shareable Storage Area Network (SAN) + Clustered File System (CFS) Data sharable Hybrid (iscsi) Data sharable Easy administration. Good for temporary storage in a distributed environment where data doesn t need to be shared. Easy to access since Ethernet is a standard across operating systems. Typically has shared file system capability built in (NFS or CIFS). Easy administration. Typically lower cost to implement (switches, built in Ethernet cards, cabling.). 10 gigabit Ethernet now available to help with performance. Allows point-to-point communication between storage and host systems. Transfer of data between storage devices does not require host CPU or memory cycles. Higher throughput rates than NAS, similar to DAS performance. Higher availability can be achieved versus NAS due to architecture. Uses existing Ethernet for clients. File system sharing part of the solution. Easy administration on the client side. Faster than NAS solution. File system and data not shareable Can be underutilized since it cannot be shared among multiple servers. Predictable performance and throughput rates harder to achieve due to other network activity. Scalability is not linear as you increase infrastructure. Data is packaged into packets and reassembled at target, which can cause slower throughput rates. Typically more complex to administer Requires specialized network infrastructure. More expensive due to extra infrastructure and extra cost of clustered file system (CFS). Often involves multiple manufacturers to create optimal infrastructure. Can be as expensive as SAN and CFS solution. More complex to administer on the server and network side, more physical components to monitor and administer Not as fast as a SAN based systems.
12 2 Implementation Guidelines The following lists of items (pre-implementation, test, and design considerations) have been assembled from SAS experience with distributed environments and storage. These guidelines can help you plan your migration of SAS applications to a distributed environment. Is my application or environment a good candidate for distribution? Use the guidelines defined in the first section of this paper on when to or when not to move your application to a distributed environment. Draw a data flow diagram Draw a diagram of the environment that helps you understand how data moves through your application. Include the amounts of data and throughput requirements. Figure 4 outlines network, storage, and server requirements. Below is an example of a data flow diagram for a grid enabled ETL process. Each phase of the process is shown with details of the volumes of data that is accessed during the phase. Understanding throughput requirements during execution is critical to the storage architecture design. Use the process attached in the appendix of this paper to help analyze your applications data flow requirements. Figure 4: Sample data flow diagram
13 2.1 Initial architecture design Draw a detailed diagram of the distributed architecture. This helps identify required hardware, network, and software that you need. Be sure to label all your storage and network connections (type, sustained throughput capability, protocol). 2.2 Throughput requirements for SAS applications It is always best to use measured I/O throughput statistics from your application when building out your data architecture. However, when there are no real numbers the following can be used as general estimates for calculating throughput requirements for SAS: Ad hoc SAS Power Users (heavy) SAS Enterprise Guide / Microsoft Add-In (medium) Business intelligence users (light) MB per second simultaneous process 15 MB per second simultaneous user 1 MB per second simultaneous user Note: Many processes or applications running in a distributed fashion are classified in the first category as heavy. 2.3 Location of SASWORK and Utility Space A common question asked, is where to locate SAS temporary storage in a distributed environment. Should the temporary storage by local to the grid node, or on the network shared storage? This chart highlights the advantages and challenges of each choice. Which one you choose is dependant on your application s requirements. Where to put SAS temporary space? Local or DAS On the shared storage Advantages Predictable Performance Might be faster performance, especially if you have a heavily utilized shared storage environment. Less SAN / CFS traffic, therefore better overall performance. File system failure for SAS temporary space takes one node down (however with RAID, this is less of an issue). More disk spindles available for temporary storage, so performance could be better. No wasted disk space since all storage is shareable. One place to administer storage. Challenges Not shareable disk space, so it might sit idle. Takes up expensive real estate, maybe you want to use smaller blade servers. What do you do if you want to expand temporary storage and don t have extra disk slots. Shared resources increase likelihood of unpredictable performance Shared storage is probably more expensive than local attached More storage area bandwidth is required due to more traffic.
14 A typical SAS environment has separate file systems for the following items: SAS Data SASWORK SAS Utility Could be multiple file systems of varying sizes. Temporary storage for SAS heavily accessed in data intensive applications. Heavily used for things like SAS sort, by default included with SASWORK. SAS Binaries Location of SAS and third-party application smaller read only file system. How many you actually deploy can vary. This is based on the ability of the file system to handle certain sized disk volumes and how easily as well as the ease of growing their size in the future. The number and size of file systems is much less important than the performance of these file systems for your application. Focus on performance first. In the past the RAID level of choice for SAS was mostly RAID 0, which is great for performance but provides no redundancy. Today, most customers are implementing RAID 5 file systems for almost everything. Dedicated hardware on the storage devices to generate RAID redundancy information (parity) has improved in performance and increased RAID 5 performance. The performance is equal to mirroring and stripping combined (RAID 1+0, RAID 10). The SAS temporary space should be implemented with redundancy; otherwise any disk failure would take down all SAS processes running across your system. 2.4 Keep it simple Distributing an application across a grid requires a different type of thinking than that of a single server implementation. It is important not to over design and make your architecture so complex that it is difficult to monitor, expand, diagnose, tune, and monitor. Be sure to consider if the performance improvement is worth the additional cost for administration and usability of a distributed environment. 2.5 Design testing It is a common mistake to minimize the data volume during testing of any solution or product because of a lack of available resources or time. This results in an unrealistic test as the hardware (both storage and servers) caches data. A test that cheats and uses cache for most of the data access does not truly test the entire system and sets false expectations. Be sure you use enough data in your tests to saturate hardware cache for storage and system memory. 2.6 Test I/O throughput outside of SAS When you build out your test environment or the final distributed architecture, it is critical to test the I/O throughput capability before you install or test SAS. Listed in the reference section of this paper are some links to tools that can be used to test I/O throughput outside of SAS. Getting a baseline of
15 the hardware before executing your SAS application will help you isolate performance issues after you deploy your application. If your baseline is not what you expect the system to be capable of, this allows you to tune and fix the architecture before you implement SAS. Doing I/O tests early saves you time and frustration in the deployment phase. 2.7 Example migrations from stand-alone to grid Here are two possible examples of applications that are good candidates to consider moving to a distributed or grid environment (there are many others). The first example is a custom SAS Data Integration or ETL application. The second is a large multi-user environment that grew beyond the ability that a single server can handle. These examples give you a high-level idea of the process o you when determining if and when it is beneficial to move to a distributed environment. determine whether. Financial considerations, although important, are not included in this technically focused discussion SAS Data Integration example Problem: Customer has multiple Terabytes of data that they need to process into a central Teradata warehouse on a nightly basis. The processing window is shrinking and the data volumes are growing. The data comes into the SAS server through hundreds of individual input streams that are stored in separate ASCII text files. The customer is not able to complete the ETL jobs in the required window. The current 8 CPU server is 100 percent busy. A single serial SAS program was written to handle the ETL process of the incoming files Evaluation process From first glance, this example appears to be a good candidate for grid. The technical questions to ask are the following: What is the I/O requirement per process? Are you sure CPU bound? What is the ETL processing window? How long does it take to process the largest input stream? Average runtime? How many input streams to you need to process? If applicable, what is the bandwidth of the connection to the Database and or other data resources?
16 2.7.3 Design approach Make sure the SAS program can be split into separate programs that can execute simultaneously across multiple grid nodes. Determine how many simultaneous SAS threads or processes are needed to complete the ETL task in the required window of time. When larger SAS processes and tasks take longer to run than the execution window allows, look at implementing parallelization inside the SAS process. Break the processing of larger files into a parallel process. In order to break apart your data, you need to look for processing dependencies. If there are any, you can not be able to parallelize within the program. Draw a data flow diagram for the problem: include throughput, runtime, and data volume requirements. Analyze your data flow diagram and begin to design your hardware architecture. Test your design in a lab or validate it by comparing to existing reference architectures. Sample solution This problem is an ideal candidate for a SAS grid implementation using CFS or hybrid storage architecture. Although this is a simplified example, the data volumes listed here are better suited for these two types of storage (higher throughput requirements). When the data volumes were lower and there was more emphasis on CPU utilization, then a NAS solution could be used. An example solution for this would be multiple 4-way servers attached via a fiber channel switch to a large storage array(s). A CFS would be used to provide for file sharing across the grid nodes. SAS Grid Manager would be used in combination with SAS Data Integration Server to deploy and manage SAS across the grid. 2.8 Multi-user environment example Problem: Customer has many concurrent SAS users sharing a single very large multiple CPU server (SMP system). The system response is unpredictable and administrators notice lengthy periods where CPU is at or close to 100 percent utilization (sometimes causing a server crash). There is no room in the current hardware to add CPU or I/O resources and the user base and demands on the server continue to increase. The customer uses SAS Business Intelligence, Microsoft Add-In, SAS Enterprise Guide and also has many ad hoc SAS users. There are many ETL processes and many SAS stored processes are spawned by the SAS applications. The data warehouse is a combination of SAS data and remote access to an RDBMS.
17 2.8.1 Evaluation process This example also appears to be a good candidate for a grid or distributed environment since growth appears to be an issue and there are many simultaneous, but different users. The technical questions to ask are: What is the I/O requirement per process? (Are you sure you are CPU bound?) How many users in each class and what are the response time requirements? What are the data volumes that each user needs to access and what are the typical throughput requirements in MB per second? (When possible draw a data flow diagram.) If applicable, what is the bandwidth of the connection to the database and or other data resources? Are there user groups that could be isolated to different grid nodes in order to improve response time? Are there priority users? (Grid scheduling can help with this.) Design approach Make sure that all the SAS applications can be executed on a grid. Implementing the SAS applications on the grid should not impact the way the users access SAS today. Estimate the number of CPUs and I/O storage that need (estimate from data you can get from the existing environment be sure to factor in growth). Are there tasks that can t be run inside of current runtime requirements? If so, can these tasks be parallelized to execute across multiple nodes (parallelism inside the process itself). Draw a data flow diagram for the problem: include user types, throughput, runtime, and data volume requirements. Analyze your data flow diagram and begin to design your hardware architecture. Test your design in a lab or validate it by comparing to existing reference architectures Sample solution This problem is an ideal candidate for a SAS grid implementation and probably best suited for CFS or hybrid storage architecture. Although this is a simplified example, the data volumes listed here are better suited for these two types of storage (higher throughput requirements). However, if the data volumes are lower for some applications / user types, it could be possible to use NAS for some of the user groups. This could help reduce overall cost, but would also increase complexity if you use multiple storage types.
18 An example solution for this would be multiple grid nodes attached via a fiber channel switch to a large storage array(s). A CFS or hybrid shared file system would be used to provide for file sharing across the grid nodes. SAS Grid Manager would be used in combination with the SAS products to deploy and manage the users across the grid.
19 3 Conclusion Grid distributed systems can be implemented on many different storage technologies. It is important to always spend extra time when designing and configuring your storage architecture to meet your applications requirements. Improper storage configuration is the number one reason for poor performing SAS grid distributed environments. The guidelines outlined in this paper can help you design a successful solution. 3.1 Authors Cheryl Doninger Research and Development Tom Keefer Enterprise Excellence Center
20 Appendix A: Storage basics and definitions Storage systems are a complex and detailed part of today s information technology systems. This section defines some of the background and definitions that are required to understand the concepts for setting up a distributed environment. Storage device A device where data is physically stored. Data remains stored on a storage device even when the power is removed. This could also be referred to as an array, cabinet, or storage appliance. File system A component that manages the data stored on a storage device. The file system handles things like storing, retrieving, caching, and buffering data. Storage Client A computer or server that accesses storage over an Ethernet or storage area network. In a grid environment this would be referred to as a grid node. DAS This stands for Direct Attached Storage. Direct attached storage is one or more disk drives located within a computer cabinet and connected directly to the CPU (this also includes internal storage). Examples file systems for direct attached storage: AIX JFS, AIX JFS2, Veritas VXFS, and Sun Microsystems ZFS. Note: File systems all have support matrices. Refer to the vendor support matrix of a particular filesystem to determine whether your environment is supported. JBOD This stands for Just a Bunch of Disks. This is an older acronym used when talking about a group of hard drives attached directly to a computer via a fiber channel, IDE, or SCSI connection. These disks are connected without the benefit of features typically found in advanced storage subsystems like file cache and or RAID controllers. These types of hard disks are managed directly by operating system tools and require CPU resources from the host computer system to manage. Switch A switch is a device used to connect multiple devices or computers to a network. The switch is essentially a central connection point for these devices. Ethernet and Fiber Channel switches are the most common types of switches used in computer architecture. Ethernet switches are used in TCP/IP networking and Fiber Channel switches are primarily used for storage networks. NAS This stands for Network Attached Storage. NAS systems are typically computers with a specialized operating system designed to efficiently manage and share large amounts of storage over a network. NAS systems are usually accessed over a TCP/IP computer network rather than being directly connected to a computer. This allows multiple computers to share the same storage space which minimizes overhead by allowing centralized management of the hard disks. NAS systems are accessed over Ethernet using a file based protocol such as NFS or Microsoft Common Internet File System (CIFS). The performance of NAS systems depends heavily on memory cache (the equivalent of RAM) and network interface overhead (the speed of the router and the network cards). The benefit
21 is that the NAS system can share storage with any computer on the network. The disadvantage of NAS is that any network performance inefficiencies affect storage access for all systems accessing the network storage. Network Appliance and EMC are examples of companies that sell NAS devices. NAS devices come with the file-system: NFS for UNIX or CIFS for Windows. Computer 1 Ethernet Switch / Network Computer 2 Network Attached Storage Device Figure 5: Network Attached Storage (NAS) diagram shared data NFS stands for Network File System. NFS is the built in network file system in a UNIX and or Linux operating system (CIFS is the Windows equivalent). This UNIX standard was developed by Sun Microsystems Inc. NFS allows computers to access data over a network as easily as if the data is located on the local disks. NFS client capability is typically included for free with most major UNIX operating systems.
22 Volume 1 appears local to Clients Volume 1: Mounted at Client via NFS Ethernet Network Computer (NFS Client) NFS Server (NAS Device) Volume 1: NFS Shared File System Figure 6: Network File System (NFS) Diagram Shared Data CIFS Common Internet File System CIFS is a network file system designed by Microsoft for sharing data across a local network (similar to NFS but for Windows). SMB (Server Message Block) was its predecessor (designed by IBM and Microsoft). The previous diagram for NFS also applies to CIFS. Many NAS devices also support CIFS as a mechanism for sharing files to Windows clients. CIFS client capability is built into all Microsoft Operating Systems. SAN Storage Area Network SANs enable multiple hosts to share a storage device but not data or the same physical storage blocks. It is important to note that a shared file-system is required in conjunction with a SAN to share data between multiple hosts. SANs use a block-based protocol which generally runs over an independent, specialized storage network, but not Ethernet. SANs typically use a fibre channel switch to allow multiple hosts to connect to the same storage device. When a shared file system is installed in conjunction with a SAN, data can be shared by multiple hosts. Examples of shared file systems include: Redhat GFS (Linux); IBM GPFS (AIX); HP Polyserve Matrix (Windows or Linux) and Sun Microsystems QFS (Solaris). It is important to check to determine whether your operating system, storage hardware, and network equipment are supported by your shared file system choice. Refer to the vendor support matrix of a particular file-system to see whether your environment is supported.
23 Note: The following file systems are NOT shared file systems. They are for single server access only: AIX JFS, AIX JFS2, Veritas VXFS, Sun Microsystems ZFS, Windows FAT32, and Windows NTFS. Shared file systems are typically not included with the operating system and are an additional expense. Mounted Volume 1 Computer Fiber 1 Channel Switch Mounted Volume 2 Computer 2 Volume 1 Shared Storage Device(s) Volume 2 Figure 7: Storage Area Network (SAN) Diagram No Data Sharing CFS stands for Clustered File System (sometimes more logically referred to as a shared file system). A CFS allows multiple servers to simultaneously share the same data from a shared SAN. Clustered file systems are considered enterprise storage because they can enable scalable highperformance, high availability, simplified management, and data integrity. HP Polyserve Matrix Server, Sun QFS, IBM GPFS, and Linux GFS are all types of CFS. The critical features these file systems provide are data sharing along with data locking. Data locking prevents multiple servers from writing to the same data/file at the same time. The term cluster is used in the name as the shared file system was originally focused at sharing file systems between servers for high availability. Over time CFS has become more widely used as a mechanism to share data for computational collaboration across multiple compute engines (grid nodes).
24 Mounted Volume 1 System 1 Fiber Channel Switch Mounted Volume 1 System 2 Shared File System Volume 1 Shared Storage Device(s) Figure 8: Clustered File System (CFS) or Shared File System Diagram shared data iscsi Internet or IP based storage protocol. iscsi allows for the transmission of storage protocols over standard TCP/IP networks (Ethernet). Storage can be easily shared across an existing network versus running new dedicated storage protocol connections. Although not as fast as dedicated or pure storage protocols, it has the advantage of lower cost and is easy to deploy on an existing network. Hybrid File Systems These file systems provide the same shared capability as a CFS, but they usually are a combination between NAS and SAN storage. Examples of these technologies are iscsi based file systems like EMC s Multi-Protocol File System (MPFSi). Storage clients accessing the file system are connected to only Ethernet cables which are in turn connected to storage devices which are connected to the network. Special network equipment and software help do the translation of Ethernet IP packets to storage protocol. These file systems vary by implementation. You should talk to your vendor of choice for exact details. Both NFS and CIFS can be supported in these environments which makes it easy to connect most operating systems (Windows or UNIX).
25 Figure 9: Considerations when building storage architecture
26 SAS INSTITUTE INC. WORLD HEADQUARTERS SAS CAMPUS DRIVE CARY, NC TEL: FAX: U.S. SALES: SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright 2008, SAS Institute Inc.All rights reserved
Overview of I/O Performance and RAID in an RDBMS Environment By: Edward Whalen Performance Tuning Corporation Abstract This paper covers the fundamentals of I/O topics and an overview of RAID levels commonly
Technical Paper A Survey of Shared File Systems Determining the Best Choice for your Distributed Applications A Survey of Shared File Systems A Survey of Shared File Systems Table of Contents Introduction...
ESG Lab Review EMC MPFSi: Accelerating Network Attached Storage with iscsi A Product Review by ESG Lab May 2006 Authors: Tony Asaro Brian Garrett Copyright 2006, Enterprise Strategy Group, Inc. All Rights
Optimizing Large Arrays with StoneFly Storage Concentrators All trademark names are the property of their respective companies. This publication contains opinions of which are subject to change from time
Paper 220-29 SAS System for Windows: Integrating with a Network Appliance Filer Mark Hayakawa, Technical Marketing Engineer, Network Appliance, Inc. ABSTRACT The goal of this paper is to help customers
Technical Paper Moving SAS Applications from a Physical to a Virtual VMware Environment Release Information Content Version: April 2015. Trademarks and Patents SAS Institute Inc., SAS Campus Drive, Cary,
The Revival of Direct Attached Storage for Oracle Databases Revival of DAS in the IT Infrastructure Introduction Why is it that the industry needed SANs to get more than a few hundred disks attached to
POWER ALL GLOBAL FILE SYSTEM (PGFS) Defining next generation of global storage grid Power All Networks Ltd. Technical Whitepaper April 2008, version 1.01 Table of Content 1. Introduction.. 3 2. Paradigm
EMC Unified Storage for Microsoft SQL Server 2008 Enabled by EMC CLARiiON and EMC FAST Cache Reference Copyright 2010 EMC Corporation. All rights reserved. Published October, 2010 EMC believes the information
ABSTRACT Paper SAS6760-2016 Architecting Your SAS : Networking for Performance Tony Brown and Margaret Crevar, SAS Institute Inc. With the popularity of network-attached storage for shared file systems,
WHITE PAPER: customize DATA PROTECTION Confidence in a connected world. Best Practice for NDMP Backup Veritas NetBackup Paul Cummings January 2009 Best Practice for NDMP Backup Veritas NetBackup Contents
an introduction to networked storage How networked storage can simplify your data management The key differences between SAN, DAS, and NAS The business benefits of networked storage Introduction Historical
WHITE PAPER Data Protection Solutions for Network Attached Storage VERITAS Backup Exec 9.0 for Windows Servers VERSION INCLUDES TABLE OF CONTENTS STYLES 1 TABLE OF CONTENTS Background...3 Why Use a NAS
Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance
White Paper Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability The new TCP Chimney Offload Architecture from Microsoft enables offload of the TCP protocol
iscsi: Accelerating the Transition to Network Storage David Dale April 2003 TR-3241 WHITE PAPER Network Appliance technology and expertise solve a wide range of data storage challenges for organizations,
EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving
Scalable NAS for Oracle: Gateway to the (NFS) future Dr. Draško Tomić ESS technical consultant, HP EEM 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change
by Andrey Kvasyuk, Senior Consultant July 7, 2005 Introduction For years, the IT community has debated the merits of operating systems such as UNIX, Linux and Windows with religious fervor. However, for
Red Hat Enterprise Linux as a file server You re familiar with Red Hat products that provide general-purpose environments for server-based software applications or desktop/workstation users. But did you
Whitepaper Consolidation EXECUTIVE SUMMARY At this time of massive and disruptive technological changes where applications must be nimbly deployed on physical, virtual, and cloud infrastructure, Red Hat
WHITE PAPER W I N T E R C O R P O R A T I O N SCALABLE NETWORKED STORAGE Convergence of SAN and NAS with HighRoad www.wintercorp.com SPONSORED RESEARCH PROGRAM Scalable Networked Storage: Convergence of
EMC Backup and Recovery for Microsoft SQL Server 2008 Enabled by EMC Celerra Unified Storage Applied Technology Abstract This white paper describes various backup and recovery solutions available for SQL
Seradex White Paper A Discussion of Issues in the Manufacturing OrderStream Microsoft SQL Server High Performance for Your Business Executive Summary Microsoft SQL Server is the leading database product
NAS or iscsi? Selecting a storage system White Paper 2007 Copyright 2007 Fusionstor www.fusionstor.com No.1 2007 Fusionstor Inc.. All rights reserved. Fusionstor is a registered trademark. All brand names
Phone: (603)883-7979 email@example.com Cepoint Cluster Server CEP Cluster Server turnkey system. ENTERPRISE HIGH AVAILABILITY, High performance and very reliable Super Computing Solution for heterogeneous
Network Storage for Business Continuity and Disaster Recovery and Home Media White Paper Abstract Network storage is a complex IT discipline that includes a multitude of concepts and technologies, like
White Paper From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller The focus of this paper is on the emergence of the converged network interface controller
Scale and Availability Considerations for Cluster File Systems David Noy, Symantec Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted.
INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT UNPRECEDENTED OBSERVABILITY, COST-SAVING PERFORMANCE ACCELERATION, AND SUPERIOR DATA PROTECTION KEY FEATURES Unprecedented observability
Contributions for this vendor neutral technology paper have been provided by Blade.org members including NetApp, BLADE Network Technologies, and Double-Take Software. June 2009 Blade.org 2009 ALL RIGHTS
SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server
Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without
Database Configuration: SAN or NAS Discussion of Fibre Channel SAN and NAS Version:.0 Greg Schulz DBSANNAS.DOC Preface This publication is for general guidance and informational purposes only. The information
How to Choose your Red Hat Enterprise Linux Filesystem EXECUTIVE SUMMARY Choosing the Red Hat Enterprise Linux filesystem that is appropriate for your application is often a non-trivial decision due to
Technical white paper Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup Table of contents Executive summary... 2 Introduction... 2 What is NDMP?... 2 Technology overview... 3 HP
The Wide Spread Role of 10-Gigabit Ethernet in Storage This paper provides an overview of SAN and NAS storage solutions, highlights the ubiquitous role of 10 Gigabit Ethernet in these solutions, and illustrates
Technical Paper Performance and Tuning Considerations for SAS on Fusion-io ioscale Flash Storage Release Information Content Version: 1.0 May 2014. Trademarks and Patents SAS Institute Inc., SAS Campus
Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability
Paper SAS1501-2015 Best Practices for Configuring Your I/O Subsystem for SAS 9 Applications Tony Brown, Margaret Crevar, SAS Institute Inc., Cary, NC ABSTRACT The power of SAS 9 applications allows information
TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer
Cloud Service Provider Builds Cost-Effective Storage Solution to Support Business Growth Overview Country or Region: United States Industry: Hosting Customer Profile Headquartered in Overland Park, Kansas,
Panasas at the RCF HEPiX at SLAC Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory Centralized File Service Single, facility-wide namespace for files. Uniform, facility-wide
SQL Server Business Intelligence on HP ProLiant DL785 Server By Ajay Goyal www.scalabilityexperts.com Mike Fitzner Hewlett Packard www.hp.com Recommendations presented in this document should be thoroughly
Simple, Reliable and Affordable Vess Family Overview VessRAID FC RAID Storage Systems Fiber Channel s dominance for applications requiring high substantial bandwidth in the market still remains high as
Technical Paper A Survey of Shared File Systems Determining the Best Choice for your Distributed Applications Updated 22 Oct 2014 A Survey of Shared File Systems A Survey of Shared File Systems Table of
Using High Availability Technologies Lesson 12 Skills Matrix Technology Skill Objective Domain Objective # Using Virtualization Configure Windows Server Hyper-V and virtual machines 1.3 What Is High Availability?
89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads A Competitive Test and Evaluation Report
Post-production Video Editing Solution Guide with Microsoft SMB 3 File Serving AssuredSAN 4000 Dot Hill Systems introduction 1 INTRODUCTION Dot Hill Systems offers high performance network storage products
Perforce with Network Appliance Storage Perforce User Conference 2001 Richard Geiger Introduction What is Network Attached storage? Can Perforce run with Network Attached storage? Why would I want to run
Microsoft SQL Server Enabled by EMC Celerra and Microsoft Hyper-V Copyright 2010 EMC Corporation. All rights reserved. Published February, 2010 EMC believes the information in this publication is accurate
VERITAS Business Solutions for DB2 V E R I T A S W H I T E P A P E R Table of Contents............................................................. 1 VERITAS Database Edition for DB2............................................................
1/27 Oracle Storage Options RAW, ASM, CFS, for Real Application Cluster Unterföhring, 11.2005 M. Kühn 1 2/27 Be spoilt for choice? RAC Review Introduction Storage Options RAW In short ASM Automatic Storage
PolyServe Matrix Server for Linux Highly Available, Shared Data Clustering Software PolyServe Matrix Server for Linux is shared data clustering software that allows customers to replace UNIX SMP servers
Software-defined Storage Architecture for Analytics Computing Arati Joshi Performance Engineering Colin Eldridge File System Engineering Carlos Carrero Product Management June 2015 Reference Architecture
Performance evaluation sponsored by NetApp, Inc. Introduction Ethernet storage is advancing towards a converged storage network, supporting the traditional NFS, CIFS and iscsi storage protocols and adding
Martin County Administration Information Technology Services Proposal For Storage Area Network Systems Supporting WinTel Server Consolidation February 17, 2005 Version: DRAFT 1.4 Tim Murnane, Systems Administrator
EMC Backup and Recovery for Microsoft SQL Server Enabled by Quest LiteSpeed Copyright 2010 EMC Corporation. All rights reserved. Published February, 2010 EMC believes the information in this publication
Moving Virtual Storage to the Cloud White Paper Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage www.parallels.com Table of Contents Overview... 3 Understanding the Storage
THE CLOUD STORAGE ARGUMENT The argument over the right type of storage for data center applications is an ongoing battle. This argument gets amplified when discussing cloud architectures both private and
Large Scale Storage Orlando Richards, Information Services firstname.lastname@example.org LCFG Users Day, University of Edinburgh 18 th January 2013 Overview My history of storage services What is (and is not)
Disk-to-Disk Backup & Restore Application Note All trademark names are the property of their respective companies. This publication contains opinions of StoneFly, Inc., which are subject to change from
Introduction to Gluster Versions 3.0.x Table of Contents Table of Contents... 2 Overview... 3 Gluster File System... 3 Gluster Storage Platform... 3 No metadata with the Elastic Hash Algorithm... 4 A Gluster
Cray : Data Virtualization Service Stephen Sugiyama and David Wallace, Cray Inc. ABSTRACT: Cray, the Cray Data Virtualization Service, is a new capability being added to the XT software environment with
Building High-Performance iscsi SAN Configurations An Alacritech and McDATA Technical Note Building High-Performance iscsi SAN Configurations An Alacritech and McDATA Technical Note Internet SCSI (iscsi)
Dell High Availability Solutions Guide for Microsoft Hyper-V www.dell.com support.dell.com Notes and Cautions NOTE: A NOTE indicates important information that helps you make better use of your computer.
Storage Networking Foundations Certification Workshop Duration: 2 Days Type: Lecture Course Description / Overview / Expected Outcome A group of students was asked recently to define a "SAN." Some replies
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
DELL RAID PRIMER DELL PERC RAID CONTROLLERS Joe H. Trickey III Dell Storage RAID Product Marketing John Seward Dell Storage RAID Engineering http://www.dell.com/content/topics/topic.aspx/global/products/pvaul/top
Ultra-Scalable Storage Provides Low Cost Virtualization Solutions Flexible IP NAS/iSCSI System Addresses Current Storage Needs While Offering Future Expansion According to Whatis.com, storage virtualization
STORAGE CENTER DATASHEET STORAGE CENTER Go Beyond the Boundaries of Traditional Storage Systems Today s storage vendors promise to reduce the amount of time and money companies spend on storage but instead
WHITE PAPER BMC Recovery Manager for Databases: Benchmark Study Performed at Sun Laboratories BMC delivers extraordinarily fast Oracle backup and recovery performance close to two terabytes per hour Table
The Panasas Parallel Storage Cluster What Is It? What Is The Panasas ActiveScale Storage Cluster A complete hardware and software storage solution Implements An Asynchronous, Parallel, Object-based, POSIX
SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform INTRODUCTION Grid computing offers optimization of applications that analyze enormous amounts of data as well as load
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
Concurrent data access and fast failover for unstructured data and Oracle databases recovery at a fraction of the cost of Oracle RAC Improve application performance and scalability - Get parallel processing
Software to Simplify and Share SAN Storage Creating Scalable File-Serving Clusters with Microsoft Windows Storage Server 2008 R2 and Sanbolic Melio 2010 White Paper By Andrew Melmed, Director of Enterprise
Best Practice of Server Virtualization Using Qsan SAN Storage System F300Q / F400Q / F600Q Series P300Q / P400Q / P500Q / P600Q Series Version 1.0 July 2011 Copyright Copyright@2011, Qsan Technology, Inc.