Dec. 3 rd 2013 Advanced Knowledge and Understanding of Industrial Data Storage By Jesse Chuang, Senior Software Manager, Advantech With the popularity of computers and networks, most enterprises and organizations require storage of documents, files and other types of information. Since IBM introduced the first hard disk drive (HDD) in 1956, technology has made great progress and plays an essential role in today's information technology world. For enterprise applications, HDDs are used with multiple-user computers running a variety of tasks such as transaction processing databases, computing software and storage management software. Due to fast-growing, data-intensive network services, the number and size of files grows a lot and the information exchange becomes more time-critical. As a result, existing data storage systems face many challenges and HDDs must operate continuously in demanding environments as well as delivering the highest possible performance without sacrificing reliability. Accordingly, choosing the appropriate storage system is an important key in establishing an effective and flexible fundamental storage architecture setup. In this paper, we will provide an overview of storage systems to help you understand the related concepts and knowledge so as to determine which solution will work best for your business. Application Tendency of Storage System In a simple definition, a storage system provides the data storing services for a host computing system and it generally contains three major parts, including physical storage medium to store digital data, standard interface to connect to the computing system, and storage control unit to manage data transfer and disks. Taking an enterprise-grade storage system as an example, HDD is the main component that stores the digital data, and the significant evolution in recent years is that the transfer interface has been changed from the original parallel bus to serial bus architecture. Now it seems that both SATA (Serial Advanced Technology Attachment) and SAS (Serial Attached SCSI) are the mainstream disk drive interfaces. Currently SAS-based HDDs are used for enterprise-based storage solutions but there are a growing number of enterprises that have started using hard drives with SATA interfaces to reduce implementation costs because of the low price and continuously
improving performance (up to 6Gbps). In addition, the connecting interface serves as data access channel between storage device and host, but different storage interfaces will affect the transmission speed and stability. Unlike the Direct Attached Storage (DAS) system, which uses SAS to connect to the host computing system but within a limited distance (within 10 meters), a Network Attached Storage (NAS) system is connected to workstations and servers over Ethernet so it can leverage network-based technologies to support remote storage. As mentioned earlier, enterprises have primarily adopted the HDD as a storage medium. However, it uses a mechanical arm with a read/write head to move around and read information from the right location on a storage platter. Such mechanical movement can easily cause vibration issues and the access performance of HDD also can not be significantly improved. Consequently, flash memory in the form of Solid State Disk (SSD) without moving parts is welcomed by the market due to the considerations of durability, speed and power consumption. Common Storage Systems Using diverse control methods, a storage system is able to provide varied functions. According to the differences between the features of storage services, the four kinds of storage systems, namely JBOD, RAID, NAS and SAN, are currently the most prevalent systems on the market. The following describes these in detail. As the name implies, JBOD (Just a Bunch of Disks) is an array of hard disks that are merely concatenated together and haven't been configured as a redundant mechanism. Although it can only control multiple independent disks at same time without fancy features, many companies are still willing to use it to expand the storage space due to providing a lot of storage capacities for the host by single interface at a relatively lower cost compared with other system options. The second one is RAID (Redundant Array of Independent Disks). The basic idea is to combine a plurality of inexpensive drives into a disk array group, so that the performance achieved even more than one expensive hard drive with huge amounts of storage. For the host system, the array can be accessed by the operating system as one single drive. Depending on the selected level, RAID can offer better benefits than single hard disk, such as enhanced fault tolerance and increased data processing
efficiency. The different schemes or architectures are named by the word RAID followed by a number and the common levels are RAID-0/1/5/6/10/50/60. As previously described in this paper, NAS is a network-attached storage system and provides a cost-effective way to add hard disk space to a network. NAS is often connected to host system via Ethernet which is much cheaper than other interfaces and is a general connection interface in the enterprise network environment. Featuring networked and long-distance connections (compared with DAS), the storage system and host system do not need to be placed in same location, thus enhancing enterprise information security. Besides, the basic unit of storage on disk is different between DAS and NAS. The former is a block-level data transfer and the latter is file-level data sharing, so NAS is generally known as file-based storage. The last one is SAN (Storage Area Network) and, similar to NAS, transfers data between servers and storage devices through a network. Based on the network environment, a user can store or back up all data to a storage system which is located at another building to ensure that disaster recovery can be done within minutes rather than hours or perhaps days and a user can still access information if there are major problems, e.g. fire or flood. In contrast to NAS, the host system regards SAN as a DAS system due to taking a block as a basic unit of storage. In other words, this block-based storage has the advantages of DAS and NAS in that it allows multiple host systems to use the storage space without file-based limitation. At present, compared to the traditional approach using the fiber optic network, the Ethernet network interface for SAN has become increasingly popular due to its competitive price and the widespread use of Ethernet in enterprises and organizations. Meanwhile, we called such a SAN an "IP-SAN" since its network protocol is based on IP (Internet Protocol). The Advanced Features of SAN From the above description of many types of storage systems, you will find that SAN seems to be the best choice for the organizations with rapidly increasing IT needs such as video surveillance, broadcasting, medical image processing, cloud computing and big data processing to maximize the utilization of storage equipment. Therefore, the next content will further introduce the major features of SAN. Thin Provisioning One of benefits of installing SAN is better disk utilization through "Thin Provisioning"
which is a mechanism to provide a host with virtual capacity volumes and allows space to be easily allocated to servers, on a just-enough and just-in-time basis, so that enterprises can maximize the value of their storage investment because of the on-demand allocation of blocks of data against the traditional method of allocating all blocks up front. For example, a user assumed that he may need 1TB of storage space but only uses 100GB of physical disk capacity right now; Thin Provisioning allows the system to create the virtualized disk space for the user and just incrementally add more physical disks into the underlying storage pool when user really needs them. Meanwhile, this great flexibility enables you to fully utilize the storage space more effectively rather than hundreds or thousands of partially utilized local disks wasting power and generating heat in your data center. Storage Snapshot Unlike traditional full backups which may take a long time to complete, the Storage Snapshot only takes a minimum of time since it uses a set of reference markers or pointers to data stored in a SAN. By implementing copy-on-write methodology on entire block devices and copying changed blocks to other storage, Storage Snapshot can preserve a self-consistent past image of the block device and consumes less disk capacity because the storage required to create a snapshot is minimal to hold only the data that is changing. Additionally, if your server is running applications and has critical information that needs to be recovered quickly when a problem occurs, Storage Snapshot is definitely a function that can provide speedy recovery.
Data Deduplication Data Deduplication is a specialized data compression technique and works by eliminating duplicate data to ensure only one unique instance of the data is actually retained on disks. Since redundant data is replaced with a pointer to the unique data copy, this function can improve storage utilization and can also be applied to network data transfers to reduce the number of bytes that must be sent. Combining deduplication with traditional compression can enable the size of the compressed file to be only one tenth of the original size; a system can save more storage space while keeping data integrity. Scale-out Storage In order to establish a flexible and high-performance storage system, scale-out storage Ethernet architecture is ideally suited to virtualization, cloud, and big data environments by adding more nodes to a system horizontally. The scale-out storage differs conceptually from the older scale-up approach which just vertically added storage capacity to the same head end; moreover, the performance will be significantly decreased when the loading is too heavy. On the contrary, capacity, computing and network connectivity are scaled equally in the scale-out environment, so performance can remain linear even as more units are added. Secondly, the host can freely access any storage space and these nodes can communicate with each other as well. Conclusion As businesses continue to grow and flourish, their storage systems need to upgrade and the rise of emerging applications (such as cloud systems, Video-On-Demand services, high definition image files, real-time and interactive data exchanges, etc.) makes massive data management a serious problem. An industrial storage solution is preferable, and applicable to enterprises, especially in non-consumer applications, to easily increase capacity with better performance and reliability and to maximize the storage management efficiency with effortless backup and maintenance while providing long term investment protection for your companies.