Support for Storage Volumes Greater than 2TB Using Standard Operating System Functionality Introduction A History of Hard Drive Capacity Starting in 1984, when IBM first introduced a 5MB hard drive in the IBM PC-XT personal computer, drive capacities have been increasing at a logarithmic rate. To keep up with this growth, the PC software infrastructure has had to continually evolve to accommodate increasing capacity limits. First there was a 512MB limit, followed by limits at 2GB, 8GB, and then 32GB. At each step, changes were made to the BIOS, operating system, drivers, or firmware to support the ever larger drives. Where Are We Going? With 250GB Serial ATA (SATA) drives readily available, 400GB just starting to ship, and 500GB right around the corner, the next big limit will be 2TB. Although it seems as though drive capacities will have to grow by 4X to hit this limit, this calculation doesn t take RAID into account. With RAID, large 400GB drives can be easily combined into a logical drive that exceeds 2TB. For example, the Adaptec 21610SA 16-port SATA RAID controller can currently support 6.4TB of total storage with sixteen 400GB drives or 8TB with 500GB drives. The Problem: The 2TB Limit Blocks and Bytes All modern drives organize storage into blocks, which are typically 512 bytes. The Small Computer Systems Interface (SCSI) standard developed in the early 1980s referenced a structure called a Command Descriptor Block (CDB) that was 10 bytes long. The CDB, in turn, contained a Block Number field that was 4 bytes long. SCSI, Serial Attached SCSI (SAS), and Fibre Channel (FC) drives are all based on this SCSI standard, as are most operating system storage stacks. A 4-byte block number field is large enough to represent 4,294,967,296 unique blocks, or a total of 2TB of storage if those blocks are 512 bytes each. When the SCSI standard was developed, the potential of exceeding 2TB per disk seemed to be a distant problem. Today, however, this 2TB volume size limitation affects not only internal drives, but also external disk arrays, such as Fibre Channel and iscsi. Fortunately, the SCSI committee had the foresight to also define a 16-byte CDB that contained an 8-byte block number, allowing up to 4 yottabytes (YB) of storage to be referenced. (This is approximately four trillion TB). Since IBM projects that the total volume of all online storage on private and public networks will reach just 1 YB in 2010, a single 4YB logical drive is as difficult to imagine as a 2TB drive seemed twenty years ago. However, most major operating systems and hardware vendors recognize that it s important to go beyond the 2TB limit, and are in the process of adding support for 16-byte CDBs. In the meanwhile, this paper will explain how to work around the 2TB limit.
2 The Solution: Operating System Virtualization Many Operating Systems Already Support Volumes Greater than 2TB The previous section discussed the ability to reference disks of different sizes through a specific disk interface. But it takes more than an interface to access large drives. The complete software stack used to access data consists of several layers that all need to correctly support large block counts, as shown below. FILESYSTEM STORAGE STACK DRIVER BIOS FIRMWARE Commonly, the firmware, BIOS, driver, and at least part of the storage stack support 32-bit block numbers, thereby limiting the storage to 2TB. However since most filesystems support page sizes larger than 512 B, they actually already support volumes greater than 2TB. In other words, a filesystem may have only 32-bit block numbers, but those blocks are commonly multiples of 512 bytes, such as 2KB, 4KB, 8KB, etc., allowing the volume size to be 8TB, 16TB, 32TB, respectively. Of course, the factors that go into defining the max volume size are much more complicated than this, but block size certainly is an important factor. With drivers presenting up to 2TB volumes and filesystems supporting greater than 2TB volumes, there is one piece of the puzzle missing: a method for combining smaller drive volumes into larger filesystem volumes. This is possible with the several operating systems (OS) that have a virtualization layer in the storage stack. Just like the array virtualization in a PCI RAID controller, the OS virtualization layer is able to combine smaller volumes (disks) into a larger volume (virtual disk) for improved performance and increased capacity. This low-overhead, high-performance operating system layer is the key to the solution presented in this paper. One recommended method to implement this solution, available on Adaptec RAID controllers, is to have the multiple 2TB arrays span all the drives, as shown in the following figure. In this example, the left-most 2TB array uses the top portion of each drive, and the right-most 2TB array uses the bottom portion of each drive. Of course this example can be extended to have as many 2TB arrays as the available drive space will permit, with the operating system virtualization layer used to combine them all into a single entity. FILESYSTEM STORAGE STACK DRIVER BIOS FIRMWARE Following sections will go into detail on how to configure the 2TB arrays and the virtualization layers for the Windows and Linux operating systems. Exceeding 2TB With Adaptec Storage Manager TM Before configuring the operating system, configure the arrays. After performing the steps in this section, jump to the section with the steps for the appropriate target operating system. In this example, a system called DataTub is configured with an Adaptec SATA RAID 21610SA controller attached to eleven 250GB SATA drives, as shown below in a screen shot from the Adaptec Storage Manager (ASM). The goal is to create a single operating system volume using the entire 2.5TB of disk space. 2 4 2
3 Select all of the drives and begin creating 2TB arrays until all of the storage is consumed. At this point in the procedure, go to the section with steps for the preferred operating system, either Windows or Linux. ASM will default to the maximum array size, which is shown below as 2047.998GB, or 2TB. Microsoft Windows This section will cover the steps used in the following Microsoft operating systems: Windows 2003 Server Windows XP Pro Windows 2000 Pro All versions of these operating systems have a limit of 512TB virtual drives using the current filesystem. Now that the storage is available as multiple arrays to the operating system, the Windows Disk Management Console will be used to combine the arrays into one large virtual drive exceeding 2TB. As shown below, Windows sees the 2TB and 0.5TB disks as Basic and Unallocated. Unless the total storage is an exact multiple of 2TB, the last array created will have a smaller size. In this case, the last array is 512.778GB. The next step is to convert the basic disks to dynamic disks by right-clicking on the new, unallocated disks and selecting them in the window below. For this example the entire storage of the eleven drives was converted into two arrays, one with a capacity of 2TB and the other a capacity of 0.5TB.
4 After the conversion, the disks will be marked as dynamic. Continue to step through the wizard, selecting the drive letter, format type, etc. After the format completes, the Disk Management Console will show the formatted volume, still as separate disks but with the same drive letter G: in this case. The disks must then be formatted. This is the stage where they will be combined to form one large volume. Right-click on the unallocated disks and start the New Volume Wizard. Select Spanned to group all disks, regardless of their size, together into a single logical entity. Do not select Simple, which creates a volume on only one of the unallocated disks, or Striped, which forces all disks to have an equal capacity. Right-clicking on the new volume and selecting Properties shows the new volume to have a capacity of 2.5TB. All of the unallocated storage arrays must now be moved from the Available column to the Selected column as shown below. Creation of a Windows volume greater than 2TB is now complete, and the storage is available for use.
5 Linux This section will cover the steps used in the following Linux operating systems: SuSE v9.1 and v9.2 SuSE Linux Enterprise Server (SLES) v9 and v9sp1 RedHat Fedora Core (FC) v2 and v3 RedHat Enterprise Linux (EL) v4 Mandrake v10 and v10.1 Any other Linux release using the 2.6 kernel will have the same basic support and setup steps when using the command prompt. The steps using the graphical user interface will be different and therefore will not be documented in this paper. Each Linux filesystem has its own inherent limitation, assuming the latest patches are installed, as shown below. ext2 and ext3 = 16TB ReiserFC = 16TB XFS = 9,000,000TB JFS = 4YB The Logical Volume Manager (LVM) will be used to create volumes exceeding 2TB. In this example, the operating system is SLES9, and the system name is DataTub. First, use lvmdiskscan to identify the unused storage. DataTub:~ # lvmdiskscan /dev/sdb [ 2.00 TB] /dev/sdc [ 512.77 GB] 0 LVM physical volume whole disks 0 LVM physical volumes In this case, devices sdb and sdc are available and have not yet been configured for LVM usage. These devices, which are actually arrays presented by the Adaptec controller, must first have a physical volume label created using pvcreate. Note that the number of sd* devices is dependent on how much storage is available and how many 2TB arrays have been created. Repeat these steps for each sd* device. DataTub:~ # pvcreate /dev/sdb No physical volume label read from /dev/sdb Physical volume /dev/sdb successfully created DataTub:~ # pvcreate /dev/sdc No physical volume label read from /dev/sdc Physical volume /dev/sdc successfully created Performing a physical volume scan with pvscan shows that both devices are now available and the total size of 2.5TB is not yet in use. DataTub:~ # pvscan PV /dev/sdb lvm2 [2.00 TB] PV /dev/sdc lvm2 [512.77 GB] Total: 2 [2.50 TB] / in use: 0 [0 ] / in no VG: 2 [2.50 TB] Also, the logical volume scan now shows two physical volumes available as whole disks. DataTub:~ # lvmdiskscan /dev/sdb [ 2.00 TB] LVM physical volume /dev/sdc [ 512.77 GB] LVM physical volume 2 LVM physical volume whole disks 0 LVM physical volumes Use vgcreate to create a volume group (VG) that combine these two physical volumes into a single volume called big. DataTub:~ # vgcreate big /dev/sdb /dev/sdc Volume group big successfully created Scanning the volume groups with vgscan shows that big has been successfully created. DataTub:~ # vgscan Reading all physical volumes. This may take awhile Found volume group big using metadata type lvm2 Likewise, pvscan now shows that both physical volumes are consumed in a single volume group, lvm2. DataTub:~ # pvscan PV /dev/sdb lvm2 [2.00 TB] PV /dev/sdc lvm2 [512.77 GB] Total: 2 [2.50 TB] / in use: 2 [2.50 TB] / in no VG: 0 [0 ] Use lvcreate to turn this volume group into a logical volume (LV). The -L parameter specifies the size of the volume, or the total of the physical volumes, which in this case is 2.5TB. The -n parameter is used to call this volume bigvol. The last parameter, big, is the volume group name from which the logical volume is created. DataTub:~ # lvcreate L 2.50TB n bigvol big Logical volume bigvol created Scanning with lvscan shows that a new 2.5TB logical volume has been created. DataTub:~ # lvscan ACTIVE /dev/big/bigvol [2.50TB] next free (default) Finally, a filesystem must be created. The newly created 2.5TB volume has the preferred filesystem created by using mkfs. This step may take several minutes to complete on large volumes. DataTub:~ # mkfs /dev/big/bigvol Lastly, the new filesystem needs to be mounted using mount. In this case it is mounted in the /mnt/big" subdirectory. DataTub:~ # mount /dev/big/bigvol /mnt/big Running df shows the mounted 2.5TB volume. DataTub:~ # df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/big-bigvol 2.5T 20K 2.4T 1% /mnt/big To make the mount occur automatically at each boot, add the new device to the /etc/fstab file. Creation of a Linux volume greater than 2TB is now complete, and the storage is available for use.
6 Other Considerations While this paper has shown an approach for creating large volumes or logical unit numbers (LUNs) careful storage planning is required before using these techniques. Large volumes can be beneficial, but also come with their own problems. For example, it can take a very long time to rebuild an extremely large LUN after a drive failure or to run a file system check after a crash. Also backup times for Volumes of this size can be considerable. Breaking up RAID sets and volumes for backup into smaller, manageable chunks, can alleviate these problems by allowing the operating system to perform multiple tasks in parallel instead of a single very large task. In Summary This paper shows how to achieve volumes larger than 2TB for applications which require this without any special controller software or firmware. Simple virtualization of volume sizes greater than 2TB is possible using current operating systems like Microsoft Windows 2000/2003/XP, and Linux 2.6 kernels and above. There is no performance penalty for using host OS virtualization. With drive spanning, disk volumes of varying sizes can be grouped into logical, virtualized volumes allowing storage to be incrementally added without having to perform massive reconfiguration processes. This 2TB limitation also exists in some external disk arrays. In these cases, drive spanning, or virtualization at the host can also be used to build volumes larger than 2TB using these storage arrays. Breaking up both RAID sets and volumes for backup into smaller, manageable chunks can alleviate inordinately long rebuild, backup, and disk maintenance times. The suggestions in the paper are interim steps. The growth in drive capacities will be driving both operating systems and storage arrays to support storage volumes greater than 2TB in the very near future. Adaptec, Inc. 691 South Milpitas Boulevard Milpitas, California 95035 Tel: (408) 945-8600 Fax: (408) 262-2533 Literature Requests: US and Canada: 1 (800) 442-7274 or (408) 957-7274 World Wide Web: http://www.adaptec.com Pre-Sales Support: US and Canada: 1 (800) 442-7274 or (408) 957-7274 Pre-Sales Support: Europe: Tel: (32) 2-352-34-11 or Fax: (32) 2-352-34-00 Copyright 2005 Adaptec Inc. All rights reserved. Adaptec and the Adaptec logo are trademarks of Adaptec, Inc., which may be registered in some jurisdictions. All other trademarks used are owned by their respective owners. Information supplied by Adaptec Inc., is believed to be accurate and reliable at the time of printing, but Adaptec Inc., assumes no responsibility for any errors that may appear in this document. Adaptec, Inc., reserves the right, without notice, to make changes in product design or specifications. Information is subject to change without notice. P/N 666457-011 Printed in USA 03/05 3759_1.4