New Directions for Campus Accessibility: The AHEAD Institute on E-Text Production. Accessing Higher Ground Production Techniques

Similar documents
Scanning with Canon High Speed Scanners (DR-5080C & DR-9080C)

Advanced Scanning Techniques

DOING MORE WITH WORD: MICROSOFT OFFICE 2010

Formatting & Styles Word 2010

Microsoft Word 2010 Prepared by Computing Services at the Eastman School of Music July 2010

Introduction to Microsoft Publisher : Tools You May Need

Microsoft Migrating to Word 2010 from Word 2003

Creating Interactive PDF Forms

Contents. A July 2008 i

paragraph(s). The bottom mark is for all following lines in that paragraph. The rectangle below the marks moves both marks at the same time.

Clip Art in Office 2000

PowerPoint 2013 Basics for Windows Training Objective

Adobe Acrobat 6.0 Professional

File by OCR Manual. Updated December 9, 2008

How To Use An Epson Scanner On A Pc Or Mac Or Macbook

WHAT S NEW IN WORD 2010 & HOW TO CUSTOMIZE IT

A. Scan to PDF Instructions

Microsoft Excel Basics

A Quick Start Guide to Using PowerPoint For Image-based Presentations

Microsoft Word 2010 Tutorial

Microsoft Word 2013 Tutorial

Migrating to Excel 2010 from Excel Excel - Microsoft Office 1 of 1

EPSON SCANNING TIPS AND TROUBLESHOOTING GUIDE Epson Perfection 3170 Scanner

EPSON PERFECTION SCANNING BASICS

Introduction to Microsoft Word 2008

BDMS Banner Document Management Suite DRAFT User Instructions Page 1 of 19

Publisher 2010 Cheat Sheet

In this session, we will explain some of the basics of word processing. 1. Start Microsoft Word 11. Edit the Document cut & move

Snap 9 Professional s Scanning Module

Creating a table of contents quickly in Word

Ohio University Computer Services Center August, 2002 Crystal Reports Introduction Quick Reference Guide

PowerPoint 2013: Basic Skills

Word 2007 WOWS of Word Office 2007 brings a whole new basket of bells and whistles for our enjoyment. The whistles turn to wows.

MICROSOFT WORD TUTORIAL

Microsoft Word Quick Reference Guide. Union Institute & University

Module One: Getting Started Opening Outlook Setting Up Outlook for the First Time Understanding the Interface...

Ansur Test Executive. Users Manual

Microsoft Migrating to PowerPoint 2010 from PowerPoint 2003

Excel 2007 Basic knowledge

Communicate: In Print

Scanning Scanning images. cover

PaperPort Getting Started Guide

Kurzweil 3000 version General User Guide

SECTION 5: Finalizing Your Workbook

CardReader 100 Scanner Copyright 2003 Visioneer, Inc. Visioneer and Visioneer logo are registered trademarks of Visioneer, Inc. All rights reserved.

LETTERS, LABELS &

Creating Personal Web Sites Using SharePoint Designer 2007

Word Processing programs and their uses

Network ScanGear Guide

Sample Table. Columns. Column 1 Column 2 Column 3 Row 1 Cell 1 Cell 2 Cell 3 Row 2 Cell 4 Cell 5 Cell 6 Row 3 Cell 7 Cell 8 Cell 9.

Microsoft Access 2010 handout

Microsoft PowerPoint 2010 Templates and Slide Masters (Level 3)

Creating a Poster in PowerPoint A. Set Up Your Poster

3. Add and delete a cover page...7 Add a cover page... 7 Delete a cover page... 7

Introduction to dobe Acrobat XI Pro

Task Card #2 SMART Board: Notebook

PowerPoint 2007 Basics Website:

SMART Board Beginning

Basic Microsoft Excel 2007

Microsoft Word Tips and Tricks

HP Scanjet G4000 series. User Guide

Word 2007: Basics Learning Guide

State of Ohio DMS Solution for Personnel Records Training

Microsoft Word defaults to left justified (aligned) paragraphs. This means that new lines automatically line up with the left margin.

SMART Board Tips & Tricks (version 9.0) Getting Started. SMART Tools vs. SMART Notebook software

Handout: Word 2010 Tips and Shortcuts

EPSON Scan Server & EPSON TWAIN Pro Network

Perfection V800 Photo/V850 Pro User's Guide

Using Microsoft Word. Working With Objects

Digital Pen & USB Flash Drive. User Guide. December

Enhanced Imaging Options for Client Profiles for Windows

Create a Poster Using Publisher

Microsoft Word 2007 Module 1

Scanning and Editing

Table of Contents 2. Table of Contents

Karlen Communications

Instructions for Creating a Poster for Arts and Humanities Research Day Using PowerPoint

MS Word 2007 practical notes

Word 2010: The Basics Table of Contents THE WORD 2010 WINDOW... 2 SET UP A DOCUMENT... 3 INTRODUCING BACKSTAGE... 3 CREATE A NEW DOCUMENT...

Introduction To Microsoft Office PowerPoint Bob Booth July 2008 AP-PPT5

Creating tables of contents and figures in Word 2013

TLMC WORKSHOP: THESIS FORMATTING IN WORD 2010

LabelWorks LW-600P User's Guide

Q&As: Microsoft Excel 2013: Chapter 2

Scanned image. If multiple scanner installed in the computer then click here to select desired scanner. Select Resolution, Color, and Scan Type.

KB COPY CENTRE. RM 2300 JCMB The King s Buildings West Mains Road Edinburgh EH9 3JZ. Telephone:

ECDL. European Computer Driving Licence. Spreadsheet Software BCS ITQ Level 2. Syllabus Version 5.0

Software User's Guide

Adobe Illustrator CS5 Part 1: Introduction to Illustrator

Creating Custom Crystal Reports Tutorial

TSScan - Usage Guide. Usage Guide. TerminalWorks TSScan 2.5 Usage Guide. support@terminalworks.com

MICROSOFT OFFICE ACCESS NEW FEATURES

Password Memory 6 User s Guide

How To Install A Scanner On A Computer With A Microsoft Scanner On Itunes Vista 2 (Windows) (Windows 2000) (Powerbook) (Macintosh) (Apple) (X86) (Ms7000) (Netbook

HIT THE GROUND RUNNING MS WORD INTRODUCTION

Chapter 2 Printing Printing Procedure...2-1

CREATING POSTERS WITH POWERPOINT

Chapter 4: Website Basics

Microsoft Word 2010 Tutorial

Introduction to OpenOffice Writer 2.0 Jessica Kubik Information Technology Lab School of Information University of Texas at Austin Fall 2005

Transcription:

New Directions for Campus Accessibility: The AHEAD Institute on E-Text Production Accessing Higher Ground 2006 Production Techniques Gaeir Dietrich High Tech Center Training Unit California Community Colleges Ron Stewart Association on Higher Education And Disability AHEAD E-text Initiative Sponsors AHEAD E-text Institute at AHG i November 7-8, 2006

AHEAD E-text Institute at AHG ii November 7-8, 2006

Table of Contents Steps in Creating E-text... 1 Before You Start... 1 Scanning... 3 Scanning... 4 Adjusting scanner values... 4 File Structure... 5 Create Folders for Scanning... 5 Labeling... 6 Advanced Scanning Techniques... 7 Improving Your Scanning... 7 Scanner Settings... 7 Mode... 7 Resolution... 7 Brightness... 7 Contrast... 8 Threshold (Or where did my other bar go?)... 8 Color Dropout... 9 Gamma... 9 Despeckle/Remove Dot... 10 Erase Notch... 10 Reverse Image... 10 Static... 10 Improving Accuracy... 11 Adjust the Brightness and Contrast Sliders in the Scanner Panel... 11 Identify Text/Graphic Blocks Correctly... 11 Use High-Quality Images... 11 Canon High Speed Scanners... 12 Canon 5080C Scanner Device Setup... 12 Canon 5080C Scanner Software Install... 13 Scanner Settings... 13 Color Dropout... 16 Canon 9080C Scanner Setup... 17 Canon 9080C Scanner Device Installation... 17 Canon 9080C CapturePerfect Installation... 18 Scanner Settings... 18 AHEAD E-text Institute at AHG iii November 7-8, 2006

Color Dropout... 22 Scan Batch to File... 23 Scanning with Separator Sheets (5080C)... 26 Scanning with the Patch Code Sheets (9080C)... 26 Scanning Separate Chapters... 28 Improving the Scan... 28 Brightness... 28 Contrast... 29 Color Drop-out Tip... 29 Thin Paper... 29 Glossy Paper... 29 Creating Job Separator Sheets... 30 Fujitsu Scanner... 30 OCR Processing... 36 Different languages... 36 Abbyy FineReader... 37 Interface... 37 Options Set-up... 37 OCR Process Beginning with an Image File (TIFF)... 44 Step One: Open an Image... 44 Step Two: Analyze Layout... 44 Step Three: Adjust Blocks... 45 Step Four: Read All... 45 Step Five: Check Spelling... 46 Step Six: Save the Document... 46 FineReader Tips... 47 MS Word... 47 Cleaning up Hyphens... 47 Knowing What Word Is Up To... 47 Making the Changes When You Want Them... 48 Understanding Styles... 49 Formatting Individual Words/Phrases... 52 Formatting with the Ruler... 52 Columns... 52 Tables... 53 Creating a Template... 54 Formatting Specifically for Large Print... 56 Adjusting Styles... 56 AHEAD E-text Institute at AHG iv November 7-8, 2006

Creating a PDF... 56 Formatting Specifically for Captioning... 57 Kurzweil 3000... 58 Steps for converting TIFF files to KESI with the automater.... 58 Using Dolphin Producer to Create DAISY Books... 62 Settings... 62 Mark-up of the Book... 64 Step 1: Get rid of textboxes... 64 Step 2: Mark up the elements of the book... 64 Step 3: Pictures can be included... 64 Step 4: Save the DTB... 64 Step 5: Burn the DTB... 64 Dolphin Ease Publisher... 65 Creating the DAISY Markup... 68 Adding Headings and Chapters... 69 Editing and Inserting Text for Graphics and Footnotes... 71 Creating Daisy Without Graphics... 72 Creating Captions As Skippable Text... 72 Creating Alt-Text For Existing Graphics... 72 Inserting Graphics... 73 Inserting Footnotes... 73 Creating the TTS... 74 Building the Book... 76 Saving and Testing the Complete DAISY Book... 79 Handouts... 82 Sources of E-text... 82 Sources of E-text... 83 Online Reference Resources... 84 Category... 84 Type... 84 Web Site... 84 Special Font... 87 Features:... 87 Formatting E-text in Word: General Checklist... 88 Short Cut Keys in Office... 89 Change or resize the font... 89 Apply character formats... 89 View and copy text formats... 91 AHEAD E-text Institute at AHG v November 7-8, 2006

Set line spacing... 91 Align paragraphs... 91 Apply paragraph styles... 91 Inserting International Characters... 102 ABBYY FineReader Hot Keys... 103 File menu... 103 Edit menu... 103 View menu... 104 Batch menu... 104 Process menu... 104 Tools menu... 105 Window menu... 105 Help menu... 105 General... 106 AHEAD E-text Institute at AHG vi November 7-8, 2006

Steps in Creating E-text 1. Remove the binding from the text. (Check to see if your campus print shop has a "guillotine.") 2. Scan the book in chapters, unless you're creating a DAISY book. 3. Scan to a multipage TIFF file. 4. For Kurzweil, open the TIFF in the Kurzweil Scan/Read station and save as a KES document. Once the file is in the KES format, you can open it on any Kurzweil system. 5. For other e-text uses, such as Braille, run the TIFF through an OCR program. 6. Strip out headers and footers, but make sure to retain page numbers. 7. Delete graphics for now, but if possible, retain the captions for figures that are referenced in the text. 8. If you are doing a math or science book, don't worry if the equations get lost, you can add them later. 9. If you have a foreign language text, make sure to set the OCR program's language feature to include that language. 10. Use the OCR program's proofreading capabilities to clean up the text as much as you can. 11. Save the resulting file to Microsoft Word. 12. Use the MS Word styles to prepare the document for Duxbury or large print, etc. 13. Burn the book onto CD and label. 14. Rebind the student s book to return to him/her. Before You Start Get in the habit of checking Louis and the HTCTU Alternate Media Exchange (AMX) Database before ordering e-text and before producing Braille or recording a book. Someone else may already have done the work! Louis database at www.aph.org AMX at www.htctu.net AHEAD E-text Institute at AHG 1 November 7-8, 2006

Fujitsu Scanner Sample 2 3/6/2006

Scanning When you scan a black and white page, your scanner automatically creates a TIFF file, which is a graphic. The scanner works in essentially the same way that a copy machine does; in other words, the scanner takes a picture of the page. However, rather than making a hardcopy picture of the page, it makes a digital picture, a TIFF. When you do your scanning, you can use a scanning utility (Scanning Utility 5000 comes with the Canon scanner) or a program such as Adobe PhotoShop. You can also scan directly into Kurzweil. Scanning utilities are usually the best option for production as they can handle auto feed and multiple pages in a document. If you are scanning using Kurzweil, an excellent overview of TIFF scanning can be found at the following Web site: http://www.indiana.edu/~iubdrh/etext/k3000prep.htm The ultimate use for your e-text will determine what you do with the TIFF file. If you want to use the TIFF in Kurzweil, you can open it on the Kurzweil Scan/Read station and save it as a KES format. The KES format can then be opened on any Kurzweil station. If you are taking your file into electronic text in preparation for creating Braille or large print, you will need to extract the text from the picture by running optical character recognition on the file using an OCR program. There are a number of OCR programs on the market, but OmniPage Pro and FineReader are two of the best. We recommend scanning first to TIFF then opening your document in the OCR program. This process is more stable than scanning directly into the OCR program. Also this process will allow you to save your TIFF file. We recommend archiving the TIFF for later conversion to a different file format. For those of you using the Canon 5080C, we have found that the best scanning settings for keeping the graphics is Black and White ED at 400 DPI. If you open the TIFF file in Kurzweil Scan/Read, you can enhance your results by using the Kurzweil "despeckle" feature. If you only want the text, the best setting is Black and White at 300 DPI. To prepare a book for multi-page scanning, you will need to tear it apart. Check with your campus print shop to see if they have a guillotine for removing the spines from books. Also check to see whether they have a comb-binding system or some other binding procedure for putting the pages back together when you're finished. (Some print shops will even do the scanning for you!) Fujitsu Scanner Sample 3 3/6/2006

Your campus may wish to develop a policy that when you are scanning (brailling, enlarging), you will only do the chapters of the book required for the class. You may also wish to have a policy that you will only scan books that have not been highlighted. Scanning Again, we recommend that you scan to TIFF using scanning software and then run optical character recognition on the scanned image. Adjusting scanner values Mode Black and white for text, music, and line drawings; photos will not be rendered well Grayscale for black and white photos, text on colored background Color color drawings, graphics, or photos (only on color scanners) Resolution 300 dpi (dots per inch) for black and white; 400 dpi for black and white ED More is not necessarily better when scanning text; for example, if you are scanning text on thin paper, scanning at high resolution may pick up bleedthrough text from the back of the paper, whereas dropping the dots per inch down, maybe even to 200 dpi or 150 dpi, may result in a better scan. Page thickness Don't mix different thicknesses of paper. If you are feeding glossy or thin paper, you may need to adjust your feeder. Contrast Contrast is a measure of how much difference there will be between light and dark parts of an image. OCR quality depends heavily on good brightness and contrast settings. Not all scanners allow manual adjustment of contrast. Brightness Brightness is a measure of how dark or pale a scanned image will be. OCR quality depends heavily on good brightness settings. An image where letter shapes run together is too dark. When letter shapes are thin or broken, the image is too light. Many scanners have an auto-brightness feature. Colored backgrounds Use the grayscale setting. You may need to decrease the brightness setting of the scanner. You may also need to check the contrast. Gray Boxes Use black and white and adjust the brightness until you reach a balance of Fujitsu Scanner Sample 4 3/6/2006

keeping most of the text in the main body and as much text as possible in the gray boxes. Scanning Boost Check out a "rescanning" product such as Virtual Rescan or Scan Fix. File Structure Create Folders for Scanning Before you begin scanning, create a folder of scanned books, and within that folder create a template that you can copy and rename every time you scan a new book. The template should have folders for front matter, chapters, and back matter within a folder that will be named for the book. Level One Level Two How you set up your e-text files is very important. Make sure that chapters are separated and clearly marked. Make sure that the table of contents and index are in separate files and can be found easily. The organization and naming convention used for the files and folders has an enormous impact on ease of navigation. Remember, if you use numbers, computers would put the numbers 1, 05, 200, 30, and 3 in the following order: 05, 1, 200, 3, 30. Fujitsu Scanner Sample 5 3/6/2006

Labeling Campuses vary in how they interpret copyright law. Many campuses feel that the ADA and, possibly, fair use allow them the right to scan. However your campus interprets the laws, there is still the legal necessity to give credit to the copyright holder. Although many campuses do not feel that they can legally work under the copyright exclusion known as the Chafee Amendment, we still use that law as a guideline for best practices. The Chafee Amendment does have a couple of restrictions. First, even under Chafee, you still must secure permission if you need to scan dramatic works (plays, scripts) that are under copyright (Shakespeare is not a problem). Second, the materials must contain a notice restricting further duplication and crediting the original copyright holder. The labeling section in the amendment states the following about the alternate format: (B) bear a notice that any further reproduction or distribution in a format other than a specialized format is an infringement; and (C) include a copyright notice identifying the copyright owner and the date of the original publication. Based on Chafee, we recommend that you use wording like the following and include it on the label of any material provided to the student as well as in a text file on any CDs: "Material in a format for use by print disabled student only. Further reproduction or distribution of this material is an infringement of copyright law. Copyright 0000, Publisher's Name." Fujitsu Scanner Sample 6 3/6/2006

Advanced Scanning Techniques Improving Your Scanning Scanning is more of an art than a science. Most of the time auto settings with Black and White at 300 dpi will give you a very good scan, but some paper types may require adjusting the brightness or contrast or both in order to get a more accurate scan. Some small fonts may require increasing the dpi to 400. Some thin papers may require decreasing the dpi to 200 or even 150. As with all arts, practice makes perfect, and it is always a wise idea to scan and OCR a couple of representative pages in order to determine the best scanner settings for a book. It is also wise to keep track of the experiments you make and the settings that you find work well. After awhile, you will remember and know what settings need to be altered, but in the beginning, writing down your settings is a good idea. Scanner Settings Mode For most text, Black and White mode is preferable. There are occasions, however, when dealing with color or shaded backgrounds that Black and White-ED (which uses an "error diffusion" algorithm to simulate halftones, i.e., grays) might be a good option. Note that the 9080C now offers "Advanced Text Enhancement" mode to help with very light documents or when text is printed on a dark background. Resolution Your resolution will normally be set at 300 dpi. This resolution is considered optimal for text and is what OCR programs are geared to work with. Small text may require 400 dpi If you have thin paper, you may be getting "bleed through" from the back side. In such a case, drop your DPI to 150 200 to improve the scan. Brightness Brightness lightens or darkens all the pixels on the page. Sometimes with very glossy paper so much light bounces back from the page that you will need to reduce the brightness of the page so that you don't get areas of "white out" where the image disappears entirely. Fujitsu Scanner Sample 7 3/6/2006

You can think of brightness as bringing "balance" to the image not too dark, not too light. Increasing the brightness lightens the image. Decreasing brightness darkens it. Brightness is a measure of how dark or pale a scanned image will be. Too dark: Letter shapes run together Too light: Letter shapes are thin or broken The value scale is 1 255. The default setting is 128. Lower numbers: Darker (decrease in brightness) Higher numbers: Lighter (increase in brightness) Contrast Contrast is a measure of how much difference there is between the light parts and dark parts of an image. Changing the contrast alters the range of lights and darks. Increasing the contrast will make the lights lighter and the darks darker. Decreasing the contrast will lighten the darks and darken the lights. The value scale is 1 13. The default setting is 7. Larger contrast value (higher number): Increases the contrast Smaller contrast value (lower number): Decreases the contrast Threshold (Or where did my other bar go?) Setting the mode to black and white will gray out the contrast bar and leave only the brightness bar. Although labeled "brightness," this bar now serves a slightly different function than it does when scanning in grayscale. Fujitsu Scanner Sample 8 3/6/2006

When scanning in black and white, the machine has to make a decision about all the grays in the image. Since only black or white are choices, the scanner has to decide, "Should I call this gray black or call it white?" That cut-off point between black and white is known as the threshold. When scanning in black and white mode, the brightness bar now functions as that cutoff point. Increasing the threshold will add more white to the image. Decreasing the threshold will add more black to the image. To improve the scan when the textbook uses gray boxes around text, try increasing the threshold (brightness). Essentially, you are telling the scanner to consider the gray in those boxes as white. Use care, however, that you do not increase the threshold to the point that you are losing some of the main body of the text. Color Dropout If you are scanning a color book that has boxes or screens behind text, you can have the scanner dropout a color. Also note that most papers are slightly colored and not pure white. Dropping out the paper color can improve the scan. For yellowish papers, drop our red. For olive or greenish papers, drop out green. Note that the 9080C now allows you to drop out color on only one side of the page. Gamma Whereas contrast affects the end-points of the darks and lights, gamma alters the midrange tones. Increasing the gamma will darken the midtones. Decreasing the gamma will lighten the midtones. Contrast this effect with adjustments to brightness, which changes the darkness or lightness for all tones, or to contrast, which increases or decreases the range of lights and darks. Fujitsu Scanner Sample 9 3/6/2006

The default factor setting is 1. Lower numbers will lighten the midrange grays; higher numbers will darken the midrange grays. The settings range from 0.2 to 5 and can be set in 0.1 increments or adjusted with the mouse by clicking and dragging on the line. It is a good idea to first adjust brightness and contrast then to work with the gamma as necessary. Despeckle/Remove Dot If there are a lot of stray marks on the page, try using the despeckle or remove dot feature to help alleviate some of the "noise." Please be aware, however, that if you are scanning a foreign language document, some of those little dots are marks are supposed to be there. Erase Notch If characters look a bit jagged, erase notch should help to smooth them. Reverse Image Reversing the image causes black to be seen as white and white as black. Use this setting when most of the page (or at least the portions you least want to reenter) is light on dark. Note that the entire page is affected so small sections of reverse text should be ignored. Static Sometimes with glossy paper, static electricity holds the pages together and causes double feeds. Get dryer sheets from the store. Tear off a strip and cut it along one edge so that you have fringe. Tape the sheet above the paper tray so that the fringe brushes across the top of the paper as it is pulled through the feeder. Also, tape a similar sheet to the back of the feeder so that it lays over the paper but remains in place as the paper is pulled from beneath the fringe. Fujitsu Scanner Sample 10 3/6/2006

Improving Accuracy These hints are taken from the OmniPage Pro Help menu. Select settings that improve accuracy in the Options dialog box. Choose Options in the Tools menu then click the tab in the Options dialog box for the settings you want to change: Adjust the Brightness and Contrast Sliders in the Scanner Panel Brightness: A measure of how dark or pale a scanned image will be. OCR quality depends heavily on good brightness settings. An image where letter shapes run together is too dark. When letter shapes are thin or broken, the image is too light. Many scanners have an auto-brightness feature. Contrast: A measure of how much difference there will be between light and dark parts of an image. OCR quality depends heavily on good brightness and contrast settings. Not all scanners allow manual adjustment of contrast. If your only criterion is OCR accuracy, prefer black-and-white scanning for good quality documents with crisp black text on a white background. Choose grayscale scanning if you are scanning pages with text on colored or shaded backgrounds, or for degraded documents with low or varied contrast. Identify Text/Graphic Blocks Correctly Make sure blocks are identified correctly before OCR. When processing automatically, be sure your original layout setting is the best one for the document. Inspect the recognition results. If there are defects due to poor blocking on some pages, change the block properties and/or locations and rerecognize those pages. Make sure you do not have a block template file loaded which is unsuitable for your current pages. To retain handwritten text, such as a signature, identify it as a graphic zone. Use High-Quality Images In general, try to use original pages when you are scanning documents. Typeset, high-quality printed page images yield the best OCR accuracy. OCR accuracy may not be as good with lesser-quality pages. With low-quality originals, sometimes a good-quality photocopy can yield better OCR results. This may be true on documents with low contrast or printed on thin paper. On the other hand, poor-quality photocopies with stripes, blotches or uneven brightness will usually give worse results. Ask senders to select Fine or Best Mode when they send you a fax. Fujitsu Scanner Sample 11 3/6/2006

Page images should be free of notes, lines, or doodles. Anything that is not a printed character slows recognition, and any character distorted by a mark may be unrecognizable. Try not to include such marks in blocks, or enclose them in an Ignore block type. Text in page images should be reasonably clean and crisp. Characters should be separated from each other and not blotched together or overlapping. If you have influence over the styling used in documents you want to recognize, avoid having underlines used. It is difficult to recognize underlined text because the underline changes the shape of descenders on the letters q, g, y, p, and j. The ideal resolution for OCR is 300 dpi. Images with less than 200 dpi or more than 400 dpi are liable to yield far lower accuracy. If you have the documents on paper, scan them again with better settings. If not, ask the people who supply your images to use 300 dpi. Canon High Speed Scanners Canon 5080C Scanner Device Setup 1. Make sure computer and the scanner is off before connecting the scanner. 2. Connect your computer to the scanner using the cable. 3. On the back of the scanner, set the SCSI ID and terminator by setting the switches. On the back of the scanner there are four DIP switches. The first switch is for termination, so move it to the "up" position to terminate it. The last three switches are for SCSI ID. We can leave it at the default (SCSI ID 2). 4. Connect the power cord. 5. Turn on the scanner first then turn on the computer. 6. After turning on the computer, the "Found New Hardware" wizard will come up. Click next. 7. Select the second option - "Display a list of the known drivers..." 8. Under hardware type, select "Other Devices". 9. Under Manufacturer, select "Unknown". Under models, select "Unsupported Device" 11. When you get the warning about the driver may not be compatible, select "yes" to continue. 12. Click "yes" again to install the driver. 13. Click "finish" to close the wizard. 14. To verify installation, go to Device Manager and it should be listed as "Unsupported Device" under "Other Devices". Fujitsu Scanner Sample 12 3/6/2006

Canon 5080C Scanner Software Install 1. Put in CD and go to My Computer and open the CD-ROM drive. 2. To install the TWAIN drivers, open the "Pixtran" folder and double click the "setup" icon. Select the defaults when following the prompts. 3. After the wizard is finished, you will get the "Adaptec ASPI Upgrade" dialog box. Select "yes" and click "upgrade". 4. To install Scanning Utility 5000, go to My Computer and open the CD-ROM drive. 5. Open the "SU5000" folder and double click the "setup" icon. Select the defaults when following the prompts. 6. After you have finished, do a test scan to verify everything is OK. Scanner Settings We will be scanning with the Scanner Utility 5000, which comes with the Canon DR- 5080C. Choose Scanner settings from the File menu. Fujitsu Scanner Sample 13 3/6/2006

Most of your scanning will be done in either Black and White (if there are no pictures in the book) or Black and White-ED (which uses an "error diffusion" algorithm to simulate halftones). Your resolution will normally be set at 200 400 dpi. Unclear text may require 400 dpi; however, more is not necessarily better when it comes to scanning and dots per inch! If you have thin paper, you may be getting "bleed through" from the back side. In such a case, drop your DPI to 200 to improve the scan. For now, we will leave the brightness and contrast on Auto. Click the Detail button to access more scanner settings. Fujitsu Scanner Sample 14 3/6/2006

Set the Canon to automatically Detect Page Size. It does a good job of getting the entire page, but if it is leaving too large a black margin, adjust the Margin Scanning down by one or two millimeters. Try using the double-feed detection feature. It can be very helpful when you scan a book that has recently had its spine removed. Frequently, small amounts of glue will have run onto the pages causing two pages to stick together. Detecting double feeds by thickness does a good job of catching such misfeeds. The Feeding Option should be set to remote. This setting will allow you to run the Canon completely from the software. Any of the other three settings, including Auto, will require you to push the machines Start button manually. Fujitsu Scanner Sample 15 3/6/2006

Make sure that the Backside settings are unchecked. Checking these options will tell the scanner to scan only the backside of a page. When using the separator sheets, set the separator option to "Skip, Continuous Scanning" either in this window or in the prompt when you choose to scan your pages. Color Dropout If you are scanning a color book that has boxes or screens behind text, you can have the scanner dropout a color. This feature is particularly effective with red, as a pure red, when scanned, comes out black. To set this option, go under File > Scanner Settings > Detail, and choose the dropout button. Select "dropout enable" and choose the color to be dropped out. Click on Dropout Enable, and choose the color that you wish to have drop out. If you wish to retain color on part of the page, you can use the inhibit option to tell the scanner to ignore an area of the page measured in millimeters from the top of the page. Fujitsu Scanner Sample 16 3/6/2006

Canon 9080C Scanner Setup The Canon 9080C can be run either with a SCSI card and SCSI cable or with a USB cable. The machine can run on USB 1 but will be faster with a USB 2 connection. Early reports indicate that the SCSI option, if available to you, is the fastest. The 9080C has solved some of the odd setup problems we had with the 5080C. Canon drivers are now available on the Canon CD. If you will be running the Canon through the SCSI port, you can follow the first bit of the Canon 5080C instructions; however, actual Canon drivers are now available. Make sure that you install the CapturePerfect utility to control the scanner. Canon 9080C Scanner Device Installation 1. Make sure computer and the scanner is off before connecting the scanner. 2. Connect your computer to the scanner using the cable. 3. On the back of the scanner, set the SCSI ID and terminator by setting the switches. On the back of the scanner there are four DIP switches. The first switch is for termination, so move it to the "up" position to terminate it. The last three switches are for SCSI ID. We can leave it at the default (SCSI ID 2). 4. Connect the power cord. 5. Turn on the scanner first then turn on the computer. 6. After turning on the computer, the "Found New Hardware" wizard will come up. It will ask for the CD. Insert the CD, and it will automatically install the driver. Fujitsu Scanner Sample 17 3/6/2006

7. To verify installation, go to the Device Manager and the Canon should be listed under "Imaging Devices." Canon 9080C CapturePerfect Installation 1. Put in CD and go to My Computer and open the CD-ROM drive. 2. Open the driver folder and double click the "setup" icon. 3. Open the CapturePerfect folder and run "setup.exe." 4. After you have finished, do a test scan to verify everything is OK. Scanner Settings We will be scanning with CapturePerfect 2.0, which comes with the Canon DR-9080C. Choose Scanner settings from the File menu. Fujitsu Scanner Sample 18 3/6/2006

Most of your scanning will be done in either Black and White (if there are no pictures in the book) or Error Diffusion (which uses an algorithm to simulate halftones). "Advanced Text Enhancement" is a new feature that helps with very light documents. Note that CapturePerfect gives you the option of assigning your settings a name under "User Preference" so that you can recall specific settings just by choosing the appropriate name. Fujitsu Scanner Sample 19 3/6/2006

Set the Canon to automatically detect page size. If it has too large a black margin, you may adjust the margin setting. Decrease the setting. Your resolution will normally be set at 200 400 dpi. Unclear text may require 400 dpi; however, more is not necessarily better when it comes to scanning and dots per inch! If you have thin paper, you may be getting "bleed through" from the back side. In such a case, drop your DPI to 200 to improve the scan. For now, we will leave the brightness and contrast on Auto. Fujitsu Scanner Sample 20 3/6/2006

If you have one batch of documents to scan, choose the "Standard Feeding" option. The scanner will stop when the tray is empty. If you wish to feed a chapter at a time but continue scanning, you can choose "Automatic Feeding." The scanner will wait for the next batch of documents. The 9080C has new scanning options. For optimal speed, choose the "Scan Ahead" option. Otherwise, the Canon will wait for your computer as it processes the graphic images. You can also choose "prescan" to scan the first page of your document then tweak your settings. You will not have to rescan that first page. Click the "More " button to access more scanner settings. Fujitsu Scanner Sample 21 3/6/2006

Try using the double-feed detection feature. It can be very helpful when you scan a book that has recently had its spine removed. Frequently, small amounts of glue will have run onto the pages causing two pages to stick together. Detecting double feeds by Ultrasonic does a good job of catching such misfeeds. You can "deskew" (ensure that it the image is straight) your image at either the hardware or software level or both. If you are scanning pages of varying sizes, it is best not to deskew with the roller. Color Dropout If you are scanning a color book that has boxes or screens behind text, you can have the scanner dropout a color. This feature is particularly effective with red, as a pure red, when scanned, comes out black. To set this option, go under File > Scanner Settings > Detail, and choose the dropout button. Select "dropout enable" and choose the color to be dropped out. Fujitsu Scanner Sample 22 3/6/2006

Make sure that the Backside settings are unchecked. Checking these options will tell the scanner to scan only the backside of a page. You can choose to drop out or enhance a color. Scan Batch to File To scan multiple pages, choose Scan Batch to File from the File menu. Fujitsu Scanner Sample 23 3/6/2006

Assuming that you are using your separator sheets and scanning into a folder based on your template, you will be giving the file a generic name, which you will later change. 5080C Window 9080C Window Fujitsu Scanner Sample 24 3/6/2006

The Canon will use that name for the first file then append a number, beginning with 0001, to each successive file. Once you have completed the batch, go into the folder and rename the files. If you have your folders set to the Web content view, you will see a thumbnail of the TIFF file. Fujitsu Scanner Sample 25 3/6/2006

Scanning with Separator Sheets (5080C) Use the separator sheets to tell the Canon to begin a new file for each chapter. Set the Canon to scan double-sided pages and tell it to skip the separator sheet (i.e., not to scan it) and to continue scanning: Duplex; Skip, Continue Scanning. Scan the front matter, chapters, and back matter separately. Scan each into its own folder within the folder for the book. For the front matter and back matter, place separator sheets between each section. When a section does not break on a right-hand page, wait to place a separator sheet until you do have a right-hand page break or use the Simplex/Duplex trick explained below. Similarly, for the chapters, if a chapter does not begin on a right-hand page, scan it with the preceding section and break at the next right-hand page. Make sure when you label the chapters that you note the inclusive range for the chapters scanned together. TIP: Use the table of contents to help you find the correct page to insert the separator sheets and to assist you in knowing where to look for the sheets to retrieve them after you've completed scanning. Scanning with the Patch Code Sheets (9080C) The Canon 9080C comes with the patch code sheets as printable PDF files that you download from the Canon CD. If you do a full install of the CapturePerfect program, the files will be placed in a folder labeled Canon DR-6080 & 9080C. Fujitsu Scanner Sample 26 3/6/2006

When using the patch code separator sheets, set the separator option to "Skip, Continue Scanning" so that the Canon responds to the patch code but does not include a picture of the barcode in your document. Use the "Patch T" code sheets and make sure the Separator Setting is set at zero degrees. The Canon CD comes with patch codes in letter size and A4 size; however, you can make whatever sizes you need by printing the patch codes on standard paper and cutting them to size. Be sure to measure from the top left corner of the sheet. Only the barcode is needed on the sheet, all other information is extraneous and will be ignored. Fujitsu Scanner Sample 27 3/6/2006

Scanning Separate Chapters To scan a separate chapter when it does not begin on a right-hand page or end on a lefthand page, you will use a combination of simplex and duplex scanning. For example, when a chapter does not end on a left-hand page, the last page of the chapter will fall on the top of the page and the reverse of that page will be the beginning of a new section. In such a case, you would scan the body of the chapter using duplex. When that section is finished, the scanning utility will ask you if you want to stop or continue scanning. Place the final page on the Canon and set the scanning to simplex and continue the scan. Stop scanning once that last page is complete. Similarly, if the first page of the chapter begins on a left-hand page, begin with that page by itself and set the scanner to simplex. Once that page has been completed, continue the rest of the chapter in duplex, remembering that if the chapter does not end on a left-hand page, the final page will also be in simplex. Improving the Scan Scanning is more of an art than a science. Most of the time auto settings with Black and White ED (assuming you have some graphics in the document) at 300 or 400 dpi will give you a very good scan, but some paper types may require adjusting the brightness or contrast or both in order to get a more accurate scan. As with all arts, practice makes perfect, and it is always a wise idea to scan and OCR a couple of representative pages in order to determine the best scanner settings for a book. Brightness Brightness is a measure of how dark or pale a scanned image will be. Too dark: Letter shapes run together Too light: Letter shapes are thin or broken Fujitsu Scanner Sample 28 3/6/2006

The value scale is 1 255. The default setting is 128. Lower numbers: Darker Higher numbers: Lighter To improve the scan when the textbook uses gray boxes to highlight text, try increasing the brightness. I have found that a brightness setting around 240-255 with black and white at 300-400 dpi produces a scan that allows most of the text in the gray boxes to be read while still maintaining the body text. Contrast Contrast is a measure of how much difference there is between the light parts and dark parts of an image. The value scale is 1 13. The default setting is 7. Larger contrast value (higher number): Increases the contrast Smaller contrast value (lower number): Decreases the contrast Color Drop-out Tip If you are scanning a book that is black text on an off-white background, try setting your scanner to dropout red. Even if the there is no "color" in the book, the background will come out "whiter." Thanks to Chris Weidman for this tip. Also, with newsprint type papers, try dropping out red or green and seeing whether it affects the scan results. Thin Paper If you have to scan a book on thin paper, like a dictionary, for example, you may need to drop down the DPI to 200 or even 150 DPI. Glossy Paper Often with glossy paper, it's static electricity that holds the sheets together and causes double feeds. Sam Ogami suggested I use fabric softener sheets. I thought it was a joke at first but it works great. I tear off three 1 x 4 inch-wide strips and tape them to the top of the feeder so that they brush along the top of the sheets as they go through the scanner. It's been working like a charm for me. Thanks to Brian Brautigam for this tip. Fujitsu Scanner Sample 29 3/6/2006

Creating Job Separator Sheets You can copy the barcode for the job separator or patch code sheets on the copy machine and cut the new sheets to any size that you wish. Copy the sheets that came with the Canon scanner, rather than the previously copied job separator sheets. For the 5080C, you must make sure that the barcode is in the center of the sheet and 10 mm (0.4 in.) from the top of the page. For the 9080C, measure the sheet from the top left edge and cut as needed. If separator sheets become bent, folded, marked, or stained they may not function. Fujitsu Scanner To begin with, you need to select the file type. Color is generally scanned to JPEG. Black and white books are scanned to Multipage TIFF. Set the mode to black and white unless you need to keep the pictures. Fujitsu Scanner Sample 30 3/6/2006

Image scanned in B/W file size was 474 KB Image scanned in Grayscale file size was 3,731 KB Image scanned in halftone file size was 474 KB Fujitsu Scanner Sample 31 3/6/2006

Under Option : Looking first at the Rotation tab Under the Filter tab Fujitsu Scanner Sample 32 3/6/2006

Property button: Allows us to set the area around the page that will be filled what color and how far Advance button Fujitsu Scanner Sample 33 3/6/2006

Below is a page that has orange highlighter on it, scanned without color drop out. Fujitsu Scanner Sample 34 3/6/2006

Same page dropping out red Fujitsu Scanner Sample 35 3/6/2006

OCR Processing Although you can scan with either OmniPage or FineReader, we recommend that you scan your files to TIFF and then work with the resulting multipage image. There are a number of reasons: it preserves the TIFF files for later use with other applications, it prevents problems with crashing in the middle of scans, it allows you to take full advantage of the native scanning utilities that come with your scanner. We recommend OmniPage Pro or Abby FineReader for your OCR processing. When you scan, you take a picture of the page. Running OCR (optical character recognition) on the scanned document means using a program that compares what it finds on the page to what it holds in memory as known shapes for text. The cleaner and clearer the copy, and the fewer the graphics and symbols in the text, the better the text recognition will be. Be aware that even though it might look like the Kurzweil program would be more accurate because it uses a TIFF (an exact picture), the Kurzweil, too, runs OCR on the file. The only way to get scanned text into a form for editing is through an OCR process. Different languages Choose all the languages that are in the document. For math, you may find it helpful to include Greek among the languages so that the OCR program is able to recognize the Greek symbols used in mathematics. Note that FineReader has a number of special character recognition options for computer languages, medical language, etc. Fujitsu Scanner Sample 36 3/6/2006

Abbyy FineReader ABBYY Software House 3823 Spinnaker Court Fremont, CA 94538 510-226-6717 www.abbyyusa.com Interface Options Set-up In the menus, go to Tools > Options or use the keyboard shortcut CTRL + Shift + O. Fujitsu Scanner Sample 37 3/6/2006

General Tab Under the General tab, make sure that "Open last batch at startup" is unchecked. View Tab Under the View tab, note that in the combo box at the top of the page, you can choose which colors to use for highlighting uncertain characters, non-dictionary words, etc. Fujitsu Scanner Sample 38 3/6/2006

Scan/Open Tab Under the Scan/Open tab, be sure to uncheck the choice to detect image orientation. The default is for this box to be checked. Leaving it checked will result in pages sometimes being automatically rotated 90 degrees when they should not be. Read Tab Under the Read tab is the Pattern Editor option. The pattern editor may be used under the following conditions: Fujitsu Scanner Sample 39 3/6/2006

recognizing texts set in decorative fonts; recognizing texts containing unusual characters (e.g. mathematical symbols); recognizing large volumes (more than a hundred pages) of texts of low print quality. When you use the pattern editor, you train FineReader by telling it exactly what it is looking at (i.e., tell the program, this is an a, this a b, etc.). When you close a batch, the pattern is lost unless you save it in a batch template ( save options as feature in the general tab). This feature is really only helpful when FineReader is consistently struggling with a particular font or symbol set in a document. Also under the Read tab is the option to choose which languages are found in the document. At the top of the list is an option to choose multiple languages. This option is handy for foreign language textbooks and math (choose English and Greek to include Greek letter). The option for setting languages is also on the toolbar. There are more options available, however, than are shown on the toolbar. To see all the options, click on the Edit :Languages button. Fujitsu Scanner Sample 40 3/6/2006

You can expand any of the lists to view all the languages available. Most of them are not generally useful; however, under the Formal Languages choices are a number of computer languages as well as simple chemical formulas. These option are not set as part of the defaults. If you are scanning a computer science or chemistry book, you can check the language that you wish to have available in the list. Not that the box for Show this language must also be checked. Check Spelling Tab Fujitsu Scanner Sample 41 3/6/2006

Under the Check Spelling tab, you can uncheck the "Stop at words not found in dictionary" option if you will be spell checking anyway later in Word. If you are going to PDF and will not be running a spell check, then leave this option checked. Save Tab Under the Save tab is a button that allows you to customize the settings for saving the various formats. Clicking on the Formats Settings button opens a window that allows you to customize whichever format you will be saving the text into. As an example, we will look at DOC (the format for Microsoft Word documents). Fujitsu Scanner Sample 42 3/6/2006

FineReader offers the wonderful option of removing the optional hyphens from the Word document. Optional hyphens are the hyphens inserted at the syllable break at the end of a line to move the rest of the word down to the next line. OCR programs generally retain these hyphens as the program does not know whether the hyphens are there just for the purpose of saving space or they are always required. In fact, they are rarely required, and a spell check will pick up any hyphens that should not have been omitted. Fujitsu Scanner Sample 43 3/6/2006

OCR Process Beginning with an Image File (TIFF) Step One: Open an Image Step Two: Analyze Layout Use the Analyze Layout option (under Process) first on complex layouts then read the pages. Otherwise, if you read first then make adjustments to the blocks, you will have to read the page again for all the changes to take affect. Fujitsu Scanner Sample 44 3/6/2006

Step Three: Adjust Blocks Use the tools to add to or delete blocks. To reorder zones, right click and change the properties. Step Four: Read All Fujitsu Scanner Sample 45 3/6/2006

Step Five: Check Spelling Step Six: Save the Document Fujitsu Scanner Sample 46 3/6/2006

FineReader Tips Zoom window: FineReader has a zoom window (View > Zoom Window) that allows you to enlarge selected areas of the image or text. Stop spell check: To tell the program just to find OCR errors and not unknown words, go to Tools > Options > Check Spelling. Uncheck "Stop at words not found in dictionary." Reordering zones: To make reordering zones simple, add the shortcut to the Image Tools. Go to View > Toolbars > Customize. Choose as Categories "Image" and as Toolbar "Image Tools." Under "Commands" choose Renumber blocks. Click the arrow to move it onto the toolbar. Click close. Save to file: Use the Save to file option to save as PDF, HTML, etc. Formats settings: Tools > Format settings gives you access to a lot of controls designed to customize how your documents will export. Note especially the choice to delete optional hyphens before going to Word. Eraser: The eraser tool allows you to edit the underlying TIFF file by deleting pixels (i.e., changing black to white). If you wish to save the changes, go under File to Save Image As. MS Word Be aware that when you take text into Word, you may find that some of your text disappears. What has happened is that the spacing and font size are pushing text off a page, adjust the formatting and you will see the text again. We recommend only retaining fonts and paragraphs unless you need a one-to-one page correspondence. Cleaning up Hyphens If you have not chosen to remove the optional hyphens, you will see them in the Word document. These hyphens fall at the ends of lines in the book, and the OCR program includes them in the text that goes into Word. To delete these hyphens, search for "optional hyphens" (^-) and replace them with nothing. Knowing What Word Is Up To MS Word has a number of features intended to help less knowledgeable users format documents easily. Although these features appear to make life easier, when you are using a document for multiple purposes, they actually create problems. Setting the options below will give you more control over your document. 1. Turn off Word's "autoformat as you type" correct features (Tools > Autocorrect). Leave on the "Define styles based on your formatting," but uncheck all the others. Fujitsu Scanner Sample 47 3/6/2006

2. Work with the Show Hidden option on (the symbol on the standard toolbar). Making the Changes When You Want Them Although you need to turn off the AutoFormat As You Type features, leave the AutoFormat features turned on. These features you apply at your discretion. 1. Leave the replace features checked under AutoFormat. Fujitsu Scanner Sample 48 3/6/2006

2. Apply the changes manually if/when you choose. To access this option, go under Format on the menu bar and choose AutoFormat. Understanding Styles Styles contain information about how a paragraph is to be formatted. You set options for the font, including which font, its size, its style, and special effects. You also set options for the paragraph as a whole, including alignment, amount of indent, spacing before and after, borders and shading, etc. Fujitsu Scanner Sample 49 3/6/2006

The wonderful benefit of styles is that they allow you to take one e-text document seamlessly into a number of applications: Duxbury, PDF, html, etc. They also allow you to make global changes to a document when you need to make slight modifications for various e-text uses. Styles are accessed in the styles and formatting task pane (Format > Styles and Formatting): Right clicking on the down arrow next to a style's name allows you to access the option to modify the style. Selecting the modify option will access another window that allows you to choose whether you want to modify the font, the paragraph, the borders, etc. Fujitsu Scanner Sample 50 3/6/2006

If you wish to adjust the style manually, working in the regular document, you can use the "Automatically update" to change the style so that your modifications will be applied globally. Note that this feature does not work with the "Normal" style. Make sure, however, that once you have finished making your changes, you uncheck the "Automatically update" box. Fujitsu Scanner Sample 51 3/6/2006

Note that you can add the style to your template by clicking on the "Add to template" checkbox in the lower left-hand corner of the Modify Style box. Formatting Individual Words/Phrases Sometimes individual words or phrases need to be bold or italic. In those cases, you will manually select the text to change then use the Strong style for bold and the Emphasis style for italic. Formatting with the Ruler You can use the ruler to change the spacing for the tabs, as well as the indent for the paragraph. Columns If you need to use columns, it is crucial that you work with either tables or the column setting. Do not, under any circumstances, use tabs or spaces to get columns. Go under the menu to Format > Columns and choose the number of columns you wish. Fujitsu Scanner Sample 52 3/6/2006

Columns look better if the text is justified, a setting that you can select for the alignment of the paragraph. Tables Simple tables have become a very convenient option in e-text. JAWS now reads them quite well, Duxbury can handle them, and you can convert them easily into PDF or HTML. With a little planning, they're not even too bad going into ASCII. If you know how many rows and columns you want in your table, you can go to Table > Insert > Table) and select the options for the number of rows and columns you want. Fujitsu Scanner Sample 53 3/6/2006