NERSP: Saving File Space and Reducing UFIT EI&O Document ID: D0100 Last Updated: 07/03/2002 This document describes methods of managing and conserving file storage space on CNS's NERSP UNIX system. These methods include periodic purging of unneeded files and e-mail messages, and compressing infrequently accessed files so that they take up less space. 2046 NE Waldo Rd, Suite 2100 Gainesville Florida 32609-8942 (352) 392.2061 <editor@cns.ufl.edu> UF Information Technology 1
Table of Contents Introduction...3 Housecleaning--Deleting Unnecessary Files and Messages...3 File Compression...4 For More Information...7 2
Introduction As with its other systems, CNS charges [http://docweb.cns.ufl.edu/docs/d0001/d0001.html] users for files stored on its NERSP system. Although NERSP storage rates are currently considerably less than our other systems, you may still wish to find ways to reduce your storage charges. Even if you have an account for which you do not pay directly, such as a Basic Access account (administered and funded by CIRCA), daily storage charges are billed to your userid (in addition to charges such as CPU time and dial-up server connect time). Even though you do not receive a bill each month for a Basic Access account, the funding for each account is limited to a fixed amount, so you will want to make sure you do not exceed that amount--if you do, you will find yourself unable to log on until the next month's funding is allocated! Note Users considering using the UF/CNS dial-up service should be aware that this service is under review, and may possibly be discontinued at or shortly after the end of calendar year 2006. For more information, please see Dr. Hoit's memo to Deans, Directors and Department Heads of 05/02/2006, titled Charging for UF Dialup Services [http://www.admin.ufl.edu/ddd/default.asp?doc=11.11.1920.1]. NERSP does not currently impose any upper limit on the amount of disk-space a user can occupy. Consequently, without careful file management, storage charges will continue to grow as your disk storage usage increases. Housecleaning--Deleting Unnecessary Files and Messages Cleaning Up Old Files Your first approach should be to delete any files you don't need to keep. Periodic "housecleaning" of your directories and e-mail folders is important, to clear out the old, unneeded files (and messages). As with any "housecleaning" chore, putting this task off too long can result in an unmanageable mess which gets more and more difficult to clean up with each additional day of procrastination. And all the while, charges continue to grow, day by day. Determine a good time--perhaps every semester break, when many campus offices have a brief "lull" in business--and make it a habit to use an hour or so to review your files and old e-mail messages, to see what you can delete. Use the command: ls -l sort -nrk5 more...to review your files; this command will cause your files to be sorted according to size, with the largest FIRST. Take particular note of the file size column (immediately to the left of the file date column). Use the rm command to delete any files which you no longer need; for example: 3
rm filename Use the cd command to change directories: cd directoryname...and then repeat the ls command given above, to check out the contents of those directories, and clean them up, as well. NOTE: do NOT delete files from the "mail" subdirectory, unless you are sure you know what you are doing. The "mail" subdirectory contains your e-mail folders. Most users should use Pine to review these, selectively deleting old, unwanted messages. Cleaning Up Old E-Mail Messages Use Pine to review all your e-mail folders and delete unneeded e-mail messages: 1. From the Pine Main Menu, select "L Folder List". 2. Select each folder in turn, reviewing the messages stored in it, and delete the ones you no longer need to keep. Pay particular attention to folders named "sent-mail-month-year". These are where Pine automatically stores copies of every piece of e-mail you send. Experience has shown that many of these messages are not of long-term significance, and may be deleted. Note State Public Records laws and institutional policy may prescribe specific required retention periods for certain types of e-mail messages. If you are uncertain as to the policies governing retention/disposition of your e-mail messages, check with the Records Management office at your institution. File Compression Sometimes there may be files--particularly LARGE files--which you don't access often, but still wish to keep on-line, and yet also minimize your storage charges. The way to do this is by compressing those files. File compression techniques use mathematical algorithms to store the exact same data in fewer bytes. Naturally, each compression algorithm has a complementary "decompression" algorithm, which can restore the file exactly, to its original form. The file compression/decompression algorithms available for use on NERSP have been proven safe and reliable through years of use, and the underlying algorithms are well-known to computer scientists as being efficient and robust. Compressed files cannot be directly read or edited by normal means; you must generally "decompress" them first. However, for files which you only access occasionally, the inconvenience of having to decompress them may be more than offset by the savings in storage charges. For a large text file, compression may be quite effective; for example, while 4
researching this article, we compressed a 504228 byte text file down to 154359 bytes--a savings of almost 70%! Compressing a Single File While there are several compression utilities available on NERSP, we recommend that you use the GNUZIP utilities (gzip and gunzip). These are fast, efficient, and effective--that is to say, they achieve a high degree of file compression, relative to some of the other utilities. The command: gzip filename...will compress the individually named filename, creating a new file named filename.gz, and automatically delete the original filename. Reversing the Process: Recovering a Compressed File To restore the file, use the command gunzip filename Compressing Multiple Files into a Single Archive It is also possible to compress multiple files into a single compressed archive file. You might wish to do this for long-term storage of a group of related files, which you will probably wish to access at the same time; for example, a program which you only use occasionally, and its associated data files. This is a 3-step process involving another program called gtar (GNU tape archive). While the classic UNIX tar command, and the more recent gtar were originally developed for the purpose of archiving files to magnetic tape, they are also useful for managing online disk storage. The gtar program integrates the functions of the standard UNIX tar command and the gzip utility into a single command which combines multiple files into a single archive file, and compresses that using the gzip algorithm. 1. Make sure all the files you want to archive as a group are in a directory together, with no other files. If the files you want to archive are already together in a directory, you can skip this step and proceed directly to step 2. a. If necessary, create a new, empty directory (using the mkdir command) to hold the files you want to archive. (e.g. mkdir mystuff). b. Move all the files you want to archive in a group into the newly created directory using the mv command. (e.g. mv myprogram mystuff) 5
2. "Tar together" the directory and all files in it using the gtar command, as follows gtar -zcf myarchive.tgz mystuff This command says, "create a file named 'myarchive.tgz' which includes the directory 'mystuff' and all the files in it." The gtar command also compresses the resultant archive, to save storage space. Note: ".tgz" is a combination of the suffixes ".tar" and ".gz". 3. Use the rm -r directoryname command to delete the directory and all the files in it. (e.g. rm -r mystuff) Reversing the Process: Recovering Your Files The following 2-step process will decompress the archive, restore the archived files to a subdirectory under the current working-directory, and remove the archive file. 1. gtar -zxvf myarchive.tgz 2. rm myarchive.tgz Cross-Platform Compatibility: "ZIP" on NERSP Sometimes you may wish to compress files in a way which is compatible with compression utilities commonly available on personal computers. Perhaps the most widely used compression algorithm on microcomputers is the ZIP format, often found as "PKZIP" or "WINZIP," on MS-DOS and MS-Windows computers. The popular "StuffIt" utility for Macintosh also understands the ZIP format. CNS has installed the utilities "zip" and "unzip" on NERSP for the purpose of providing cross-platform compatibility for compressed files. The zip and unzip utilities on NERSP use the standard ZIP compression algorithm, and are fully compatible with the commonly used personal computer ZIP utilities (such as PKZIP v2.04g). To create a ZIP-format compressed file on NERSP, issue the command: zip -r myarchive mystuff...where myarchive.zip is the name you wish to give the newly created archive file, and mystuff is the name of a directory to be "zipped." The "-r" ("recursive") option specifies that the input file is a subdirectory, and you wish the archive to include all files in the subdirectory (as well as all subdirectories and files contained therein, recursively). If you only wish to compress a single file using the ZIP format, you can shorten the command to: zip myarchive mystuff 6
Note Unlike the gzip command, the default action of the zip command does NOT automatically delete the "source" files after creating the archive. You will need to do that yourself, using the rm command. To unzip a zip-format file on NERSP (whether created on NERSP, or on a PC), the command is simply: unzip myarchive.zip For More Information For more information on basic UNIX file management (including the ls, cd, and more commands), and on using the Pine mail system, see CNS UNIX 101 (available on the Web at http://www.cns.ufl.edu/info-services/handouts/unix101/ [http://docweb.cns.ufl.edu/docs/d0107/d0107.html] or from CNS Information Services, 112 SSRB, 392-2061 <consult@lists.ufl.edu [mailto:consult@lists.ufl.edu]>). For more information on the compression/decompression commands discussed in this document, (including many switches and options available), see the "man" pages for those commands (e.g. "man gzip", "man gunzip", "man gtar", "man zip" and "man unzip"). Your Comments are Welcome We welcome your comments and suggestions on this and all UFIT documentation. Please send your comments to: UFIT 2046 NE Waldo Rd, Suite 2100 Gainesville Florida 32609-8942 (352) 392.2061 <editor@cns.ufl.edu> UF Information Technology 7
8