Page 3
ONTENTS REFACE About This User’s Guide ......vii Organization of this user’s guide ....viii Documentation conventions......ix Related Documentation ....... x Technical Support ........xi NTRODUCTION TO RIDGE Basic OCR Concepts ....... 1–1 Features and Benefits ......1–3 New Features ......... 1–4 Enhanced Features ......
Page 4
NSTALLING AND ETTING RIDGE What Comes with TextBridge ......2–2 Supported Scanners........ 2–2 Installing and Testing Your Scanner ....2–4 System Requirements ......2–5 Before Installing TextBridge ......2–6 Uninstalling a Previous Version of TextBridge ..2–6 Using TextBridge with Pagis ...... 2–8 Learning about TextBridge before you install it..
EARNING TO RIDGE Before Beginning to Process a Document ....4–2 Using TextBridge to Process a Document ....4–2 Starting TextBridge........ 4–3 Using Automatic Processing ......4–5 Using Manual Processing ......4–8 Performing Basic Operations......4–9 Selecting the Page Source ....... 4–10 Selecting the Page Type......
Page 6
DVANCED AMPLE ESSIONS Session 1: Processing a Document to Use in a Database..6–1 Session 2: Using Zone Templates and Page Types ..6–7 Session 3: Training TextBridge OCR ....6–14 Where to Go From Here......6–20 NDEX TextBridge Pro Millennium User’s Guide...
Page 7
REFACE ScanSoft, Inc. welcomes you to TextBridge Pro Millennium for ® ® Windows 95, 98, 2000, and Windows NT 4.0. The documentation that comes with TextBridge provides you with the information you need to operate TextBridge. The documentation includes this user’s guide, a Help system, and Release Notes.
Page 8
This manual is provided both in print and electronic form. The ® entire user’s guide is provided as a digital document in Adobe Portable Document Format (PDF). To view the user’s guide in PDF format you need Adobe Acrobat Reader which is installed with TextBridge unless you already have it on your PC.
Page 9
Chapter 5 “Sample Sessions with TextBridge” walks you through several practice sessions designed to help you to learn and use the important features of TextBridge. Chapter 6 “Advanced Sample Sessions” describes more complex and less frequent uses of TextBridge. The Index provides a comprehensive list of topics to assist you in quickly locating the specific information you need.
Page 10
ELATED OCUMENTATION TextBridge provides a comprehensive set of printed and digital documentation designed to assist you in learning and operating the product. The documentation provided with TextBridge covers all aspects of installation and operation. Information provided in individual documents is not duplicated Note in other documents except for basic information about TextBridge.
Page 11
Online User’s Guide. An online version of the complete user’s guide is provided in Adobe Acrobat format (.pdf). You can access the user’s guide from the installation menu and the TextBridge Help menu, or you can open it from Adobe Acrobat Reader. Printed User’s Guide.
Page 12
Additional information about contacting TextBridge Technical Support is provided in the TextBridge Help menu. If you must contact ScanSoft Technical Support, the following information will help in solving the problem: Your software version number (This is on the back of the CD envelope and in the Help menu under About TextBridge.) Your software serial number (This is the serial number on the back of the TextBridge CD-ROM...
Page 13
NTRODUCTION TO RIDGE Welcome to ScanSoft’s TextBridge Pro Millennium, optical character recognition (OCR) software for Microsoft Windows ® 98, 2000 and Windows NT ® 4.0. This chapter provides an introduction to TextBridge including: Basic OCR concepts Features and benefits Characteristics of documents TextBridge can recognize Input image file formats Output text file formats Output image file formats...
Page 14
You can use TextBridge to scan and convert printed pages to text documents for your word processor, spreadsheet program, web browser, database program, or other text application. Pages may be from most sources, including computer printers, fax machines, photocopiers, magazines, and newspapers. Pages can be black and white or color.
Page 15
In most cases, TextBridge understands your original document’s format and maintains the layout, including columns, headers, footers, pictures, and picture captions. Pictures can be black and white, grayscale, or color. Recomposition is possible only if your text program supports pictures and layout. For example, recomposition is supported in Microsoft Word and Corel WordPerfect but not in Notepad.
Page 16
New Features TextBridge Pro Millennium offers these new features to increase your productivity: Windows 2000 Certification. Makes use of latest Windows technology to assure a consistent user experience and a more reliable and manageable application. Updated scanner support. Includes latest Scanner Wizard hint file for easy setup of popular scanners.
Page 17
Table recomposition. Advanced analytical capability results in very accurate table reformatting. Ability to edit the entire table as well as individual cells for improved recognition. Cell table recomposition is supported even if you do not choose to retain layout. Flexible multi-page document handling. Ability to view and manipulate the pages of a document using the page thumbnails.
Page 18
Integration with the latest scanners. TextBridge works with the most recent scanners. The Release Notes and the ScanSoft Web site at www.scansoft.com provides the latest information about supported scanners and getting your scanner to work with TextBridge. HTML 4.0 output and WYSIWYG capability. Output files in the latest version of HTML and preserve the original look using cascading style sheets.
Page 19
TextBridge supports formats for the programs that retain page layout in the following list: Internet Explorer Netscape Word 6.0, 7.0, 97, and 2000 Word Perfect 6.0, 6.1, 7.0, 8.0, and 9.0 Any word processor that supports RTF Retaining pictures is independent of retaining layout. Some text programs retain pictures even though they do not retain layout.
Page 20
Dynamic OCR training. You can train TextBridge’s OCR to improve recognition accuracy as the job progresses. Use dynamic training with difficult documents, such as faxes or multi- generation photocopies. TextBridge enables you to interact with the OCR process by viewing then accepting or correcting its automatic recognition decisions.
Page 21
Deferred processing. TextBridge enables you to scan all the pages of a document to a TIFF or XIF file, then later open the image file for document recognition. You can also save all the pages to a multi-page image file or save each page as a separate file.
Page 22
OCUMENTS RIDGE ECOGNIZE TextBridge includes a number of advances developed by ScanSoft, Inc. and at the Xerox Palo Alto Research Center (PARC). Consequently, TextBridge provides highly accurate OCR and format retention on the widest range of documents. TextBridge can recognize documents with the characteristics in the following list: Documents printed on typewriters, phototypesetters, and impact, ink-jet, dot-matrix, and laser printers...
Page 23
NPUT MAGE ORMATS The source of page images for TextBridge can be your scanner or it can be image files. TextBridge can recognize the following types of image file formats: Image File Format File Name Extension Windows bitmap .bmp .pcx Multi-page PCX used in some fax .dcx programs...
Page 24
UTPUT ORMATS TextBridge can convert its recognized text and pictures to files for the following programs and formats: Programs and Formats File Name Extension Ami Pro 2.0 and 3.0 .sam dBase IV .dbf DisplayWrite 5 .rft Excel 97 and 2000 .xls Excel 3.0, 4.0, and 5.0 .xls...
Page 25
Programs and Formats File Name Extension Word 6.0 and 7.0 (RTF) .doc Word 97 and 2000 (RTF) .doc WordPerfect 4.2 and 5.1 .wpf Word Perfect 6.0, 6.1, 7.0, and 8.0 .wpd WordStar .wsd Works .rtf Microsoft Word (RTF) format is also accepted by a number of ®...
Page 26
HERE TO To learn how to install and set up TextBridge on your system, go to Chapter 2. To learn how TextBridge recognizes a document and how you prepare TextBridge to do this, read Chapter 3. This chapter explains the basic concepts and functions of the software. To learn how you use TextBridge to process simple and complex documents, refer to Chapter 4.
Page 27
NSTALLING AND ETTING RIDGE This chapter describes the TextBridge software installation and setup procedures. Specifically, it covers these topics: What comes with TextBridge Supported scanners Installing and testing your scanner System requirements Before installing TextBridge Installing TextBridge Scanner setup Setting up Instant Access to TextBridge Updating your TextBridge software Uninstalling TextBridge Pro Millennium To get started quickly, proceed to the installation procedure on...
Page 28
OMES WITH RIDGE TextBridge comes with the following items: One installation CD-ROM. The CD-ROM includes software programs, language packs, sample document image files, release notes, Help files, online user’s guide in Adobe PDF format, and Adobe Acrobat Reader. A printed user’s guide to get you started. Check to be sure that you have all the items listed above.
Page 29
Depending upon the design of your TWAIN driver, you may not be able to scan in color with TextBridge. If you have a triple-pass scanner, use it in single pass, black and white mode only. If you have a Visioneer sheetfed scanner, use the Visioneer Paperport software and drag and drop an image onto TextBridge or your word processor.
Page 30
NSTALLING AND ESTING CANNER Refer the to manufacture's detailed instructions for installing your scanner. They provide the most precise information for setting up your scanner. The basic steps for installing a scanner are: 1. Install the correct scanner interface card (if one is necessary) in the PC bus.
Page 31
YSTEM EQUIREMENTS To install and run TextBridge, your Windows-compatible PC must be equipped with the following: ™ An Intel (or compatible) 80486 or Pentium microprocessor. We recommend Pentium for the best performance. A VGA, SVGA, or multi-sync color monitor. A minimum of 24 megabytes (MB) of random access memory (RAM) for Windows 95 and 98;...
Page 32
EFORE NSTALLING RIDGE After you install your scanner and check that it is working properly, you are ready to complete other preparations for installing TextBridge and learn more about TextBridge. Uninstalling a Previous Version of TextBridge If you have on older version of TextBridge, uninstall it before installing TextBridge Pro Millenium.
Page 33
5. Click Yes to continue the uninstall process. TextBridge proceeds with the uninstall. When it is finished, the Uninstall Complete dialog box appears. Click No if you decide to quit the uninstall process. 6. Click OK to restart your computer. With these steps finished, TextBridge is removed from your PC.
Page 34
Using TextBridge with Pagis The Pagis program from ScanSoft is a color scanning suite of software that enables you to scan, copy, fax, view and edit, index, search, and manage electronic documents and includes TextBridge. If you have Pagis Pro 2.0 or later installed, Pagis will use the latest version of TextBridge available on your PC.
Page 35
Browse the CD. Windows Explorer opens the TextBridge CD for you to view the folders and files that come with the TextBridge installation program. Visit ScanSoft’s Web site. Your Web browser goes to the ScanSoft Web page where there is additional information about TextBridge and other ScanSoft products.
Page 36
2. Click Install TextBridge Pro Millennium. Follow the onscreen prompts and instructions to install TextBridge Pro Millennium. 3. Specify when you want to restart your PC, then click Finish. Restarting is necessary to complete the TextBridge setup. We recommend that you restart immediately. However, if you want to perform other activities before restarting, click No.
Page 37
1. On the Windows task bar, click Start. 2. Point to Programs, then point to the TextBridge Pro Millennium folder, and then point to Scanner Setup. Scanner Setup is also available from the TextBridge Tools menu. Follow the instruction in the Scanner Setup wizard to install or test your scanner setup.
Page 38
To provide Instant Access to TextBridge from an application, use the following procedure: 1. On the Windows task bar, click Start. 2. Point to Programs, then point to the TextBridge Pro Millennium folder, and then point to the Instant Access Control Panel.
Page 39
NINSTALLING RIDGE ILLENNIUM To restore your PC to the state it was in before you installed TextBridge Pro Millennium, use the following procedure: 1. Close all active applications, including TextBridge. 2. On the Windows task bar, click Start. 3. Point to Settings, then click on the Control Panel folder to open it.
Page 40
HERE TO To learn how TextBridge recognizes a document and how you prepare TextBridge to do this, read Chapter 3. This chapter explains the basic concepts and functions of the software. To learn how you use TextBridge to process simple and complex documents, refer to Chapter 4.
Page 41
ASIC RIDGE PERATIONS This chapter provides information about the process of page recognition. Use this chapter to learn about optical character recognition (OCR), page recognition, recomposition, and operations that will help you use TextBridge effectively including automatic and manual processing and page types and settings for recognition.
Page 42
OCR? HAT IS RIDGE TextBridge is OCR software that turns paper documents or page image files into text documents on your PC. Page image data is electronic information about the pages of a document that comes from a source such as your scanner or fax software. This data becomes an image document and is stored in an image file.
Page 43
Page Type Scan Print Page Picture Size Type Layout Output Letter Fax Gray Legal Legal Good Single column B & W Letter Letter Good Single column B & W Magazine Letter Good Multi-column Gray (b & w) Magazine (color) Letter Good Multi-column Color Newspaper...
Page 44
Scanning grayscale (or color) rather than black and white can improve text recognition on pages with difficult-to-recognize text. However, grayscale scanning is slower than black and white scanning. Page sources You can get pages to process from your scanner or from page images.
Page 45
In addition, some complex, free-form layouts defeat TextBridge’s recomposition capabilities. For these types of documents, it is often best to preview pages and manually zone text and image zones that you want to capture. Retain pictures keeps pictures in the saved document if the document format supports pictures.
Page 46
UNNING RIDGE TANDALONE AND NSTANT CCESS You can run TextBridge as a standalone program or invoke it from within another program with Instant Access. You can also invoke TextBridge through image file context menus and drag- and-drop. Instant Access is also available from the Start menu. Note Standalone Program The TextBridge standalone program is a conventional,...
Page 47
Instant Access Instant Access runs more automatically than TextBridge standalone with a minimal, dialog box-based user interface. The entire document is processed with little intervention by you. Instant Access gives you direct access to TextBridge from programs such as Word and WordPerfect. Programs with Instant Access have a TextBridge command in the File menu.
Page 48
The programs in the following list do not have Instant Access capability: Acrobat Exchange Acrobat Reader Clipboard Viewer Corel Quattro Pro File Manager HotMetal Light Netscape Netscape Editor MPROVING ECOGNITION WITH ETTINGS There are a number of settings that you select in TextBridge at the beginning of the recognition process to help it recognize a document with more accuracy.
Page 49
Figure 3–2. Original Page tab in Page Type Settings dialog box This dialog box has three tabs: Original Page, Scanner, and Processing. Each lets you view or change Page Type settings. Original Page Settings On the Original Page tab, you can choose the following settings: Set the page orientation for the way text and images are printed on the original page: Any orientation...
Page 50
Select the page layout of the original page: Any layout Single column Multi-column Table As zoned by template When you select Any layout, TextBridge automatically determines the page layout. Use Any layout when pages in your document have different layouts or when your pages have complex layouts that do not fit the above layouts.
Page 51
Scanner Settings You can view and change the settings for your scanner in the Scanner tab of the Page Type Settings dialog box (Figure 3–3, next page). On the Scanner tab you can set: Original Page quality: Good print Difficult or degraded Picture Output: Black and White Gray...
Page 52
TextBridge determines the best scan resolution and color for the Original Page and Picture Output settings. Click Custom if you want to override this default scan resolution setting. Set the scan page size to reflect the actual size of the original page.
Page 53
On the Processing tab: Select the primary language of the document. If you select more than one language, they all must be in the same language group. You cannot change the language group after you begin processing a document. Select the user dictionary you want used when processing pages. You can add technical terms and proper names to a user dictionary during proofreading and training.
Page 54
For Auto Save and Send To, use the Auto Save Settings dialog box available from the Process menu to make these settings. You can view and change the settings for the output document in the Save As dialog box, each time you save a document. Except for the File name, these settings are “sticky”...
Page 55
Specify where you want to save the results of document processing. Specify the type of format in which to save the results from the list of options. Specify the default name of the scanned document to save. The default name is from text at the top of the first page recognized, or type in another name, if desired.
Page 56
Language Installation When you install TextBridge, you select one or more languages to use. If a language you want is not available at that time, check the TextBridge Web site to see if additional languages are available. TextBridge assumes your PC has the fonts needed to display text in the recognized language.
Page 57
The following items describe methods for recognizing multiple languages in the same document: Document Language Group Before you begin to process any pages, you can change the Language Group using the Document Language Group drop down list in the Processing tab of the Page Type Settings dialog box. However, once you have a page in your document, the language group control is disabled and you cannot change the language group.
Page 58
Language and Zones, Tables, and Cells TextBridge assumes that all text and table zones are in the languages that you have specified for the document. You can change the language of the selected zone, table, or table cells from the document language to any other language in the same language group.
EARNING TO RIDGE The previous chapters introduced you to TextBridge and document recognition. This chapter describes the most basic capabilities of TextBridge. You will become familiar with the basic functionality of TextBridge so that you can understand how TextBridge works. The following chapters take you from the beginning to the end of using TextBridge to process different kinds of documents.
EFORE EGINNING TO ROCESS A OCUMENT The following checklist will take you through the most important questions to ask before you start to process a document. 1. Is this document a good candidate for OCR? If you have difficulty reading a page, TextBridge may also have trouble recognizing it. 2.
TextBridge provides flexibility in performing the steps of the OCR process. You can: Process your pages automatically or interact with processing in manual mode Optimize processing by specifying settings using page types View and mark parts (zones) of pages to be recognized View and manipulate the pages of a document with page thumbnails Process pages in any order...
Page 62
To start TextBridge as a standalone application: 1. On the Windows task bar, click Start. 2. Point to Programs, then point to the TextBridge Pro Millennium folder. 3. Click TextBridge. The TextBridge main window appears (Figure 4–1). Menu Bar Main toolbar Process toolbar View area showing welcome Thumbnail area...
SING UTOMATIC ROCESSING When you use TextBridge’s automatic processing feature, TextBridge processes pages with very little interaction with you. In automatic mode, after you select the page type and page source, TextBridge automatically recognizes your page(s). TextBridge only stops for you to add more pages and to save the results of recognition.
Page 64
Click Auto button Figure 4–2. Click the Auto button in the TextBridge window 2. If TextBridge is getting a document from an image file, in the Get Pages dialog box, select the file to process. If TextBridge is getting a document from your scanner,, you may do one or more of the following: Click the More Pages button in the Add More Pages to Scanner dialog box (Figure 4–3) to scan another page.
Page 65
TextBridge recognizes the text, saves any pictures to be placed in your output, and remembers the format for your output. Click Done to proceed when all pages are scanned Click to scan more pages Click to scan second side(s) of a two-sided document Figure 4–3.
SING ANUAL ROCESSING TextBridge enables you to get remarkably accurate results from page recognition. However, page recognition is a complex process, and with some documents it can require your interaction with TextBridge to get the best output. Using manual processing, you will find a number of opportunities during page recognition that allow you to enhance the results for the particular document.
2. View and zone the page images. Click Find Zones to have TextBridge automatically find text, tables and pictures on the page or use the zoning tools to mark the zones yourself. 3. Click the Recognize button. TextBridge recognizes the page, including text, picture, and format.
Selecting the Page Source Before you start processing a new document, you can indicate whether pages are from your scanner or an image file. To do so, click the drop down arrow on the Get Pages button to select the source of the page image: your scanner, scanner feeder, or image file (Figure 4–5).
Page 69
For the best OCR results and performance, you can select the page type that best matches your original page(s). Page types make it to get the best settings for processing specific kinds of pages. A page type encapsulates all the processing settings for a kind of document, such as a magazine of fax.
Figure 4–7. Change settings for this page type TextBridge provides page types for the most common types of pages. You can also define your own page types with settings optimized for other specialized types of documents. Previewing the Page When manually processing, TextBridge displays the image of each page in the Image view (Figure 4–8).
Page 71
Check the “scan” quality of the scanned page. Delete the page, adjust scanner settings and rescan the page. Rotate the page to make the page upright. Delete the page from the document. Add more pages to the document. Cancel the process by creating a new file or opening another file. Look at the properties of the page.
Zoning the Page Before recognizing text on a page, TextBridge finds the text, table, and picture areas (or zones) on the page (Figure 4–9). TextBridge does this automatically when processing in Automatic mode. In Manual mode, you can mark the zone yourself or click Find Zones to have TextBridge automatically zone the page.
Page 73
You can use Find Zones to generate zones automatically. Then, you can adjust these zones before continuing the zoning process and recognizing the page. You can also manually zone the page. Use the text marker, table marker, picture marker, and erase marker zoning tools in the Image toolbar like highlighting markers to create and adjust zones.
Page 74
You can perform these activities related to zones: Mark text, table, and picture zones. Draw irregularly shaped zones. Have TextBridge automatically Find Zones. Edit automatic zoning. Erase a zone or part of a zone. Drag a selected zone to adjust its position. Display and edit the properties of a zone (such as language).
Proofreading the Document In manual mode, after TextBridge recognizes each page, it stops for you to proofread the recognition results (Figure 4–10). TextBridge displays recognized pages in the Text view. Click the Text tab to display the Text view if necessary. The page is laid out like the original page.
You can add corrected words to the user dictionary, which can improve recognition in subsequent pages of the same document and subsequent documents. The user dictionary is most useful for non-standard words that you frequently need to recognize, such as proper nouns and technical words. While you are still in proofreading mode, you can add pages to the final document by getting a page using either the automatic or manual process.
Page 77
Figure 4–11. Saving the page using the Save As dialog box After you save the document, your document remains in TextBridge. You can then do any of the following: Save the document in another format. Add or delete pages. Change zoning. Recognize the document again.
ETTING HILE SING RIDGE TextBridge is designed to be easy to learn and use. It contains many user assistance options to guide you. The goal of user assistance is to provide you with information at the time you need it and to provide it primarily from within the program. TextBridge offers you a variety of types of user assistance including context-sensitive tips, information screens, Help, an interactive assistant, online user’s guide, Release Notes, and...
Main toolbar View area showing welcome Click to display Show Me How window Uncheck to stop displaying welcome Figure 4–12. Welcome window Using the Show Me How Window In the Welcome window, click Show Me How to display the Show Me How window (Figure 4–13).
Click a topic to call the Assistant Figure 4–13. Show Me How window You can also click Show Me How in the Help menu. The Show Me How window appears, and you can select what you would like to learn. After you have a page, you can also learn about the Image or Text tab tools by clicking Show Me How in the Help menu.
Getting Information from Help The Help system provides general information about TextBridge, including getting started instructions, and step-by-step procedures for most operations. Use the How Do I? section to look for answers to real-life questions that you may have while using TextBridge.
Page 82
You can get Help by using the main Help Topics window (Figure 4–14) and by performing one of the activities in the following list: Select a topic from a book in the Contents tab. Select a topic from the Index tab. Search for information about a specific word or phrase using the Find tab.
Using the TextBridge Web Site The TextBridge Web site provides the latest product information, an up-to-date scanner list, tips, and links to related Web sites. Select ScanSoft on the Web from the Help menu to see this information. Links are provided to the ScanSoft Home Page, Product Information, Product Support, and TextBridge Updates.
AMPLE ESSIONS WITH RIDGE The previous chapters have introduced you to TextBridge and document recognition. This chapter provides step-by-step instructions to teach you how to use the most important capabilities of TextBridge. The learning sessions build on each other and assume that you understand the procedures explained in the previous sessions.
SING THE AMPLE OCUMENTS In this section, you will learn about the sample documents and how to open a sample document. Use the sample documents provided with TextBridge for the learning sessions in this chapter. They provide a cross-section of the types of pages that TextBridge can process.
Page 86
For this session, use letter.tif (Figure 5–1). Figure 5–1. Letter sample document After you have started TextBridge, to find and open a sample document: 1. Select image file as the page source. Click the drop down arrow on the Get Pages button and select Image File.
Page 87
Click the page type button Select a page type Figure 5–2. Select Page Type 3. Click the Get Pages button. The Get Pages dialog box appears (Figure 5–3). 5–4 TextBridge Pro Millennium User’s Guide...
Page 88
The default folder for image files is Note C:\My Documents\TextBridge\Image Files However, unless you installed TextBridge in another directory, sample image files are installed in Location of sample image files C:\Program Files\TextBridge Pro Millennium\Image Files\Samples If Samples is not the open folder, access the sample documents folder from the Look In: box in the Get Pages dialog box.
Page 89
Figure 5–4. TextBridge - Image view For this lesson, you just want to go back to where you started without recognizing the document. This can be useful if you change your mind and want to start over without processing a document further.
1: R ESSION ECOGNIZING A IMPLE OCUMENT SING ROCESSING TextBridge provides a range of powerful features. However, TextBridge is also designed to be very easy to use. For many documents, you can use default settings and automatically process a document. For this learning session, use the sample document named letter.
Page 91
To process a simple document, use the following procedure: 1. Start TextBridge. TextBridge appears. 2. Select the page source. Click the drop down arrow on the Get Pages button to select Image File. 3. Select the page type. Click the Page Type button and select Any Page (b&w), Figure 5–5).
Page 92
4. Click the Auto process button. The Get Pages dialog box appears (Figure 5–6). Select an image file Figure 5–6. Get Pages dialog box with letter.tif selected 5. In the Get Pages dialog box, double-click the sample document, letter.tif. TextBridge reads the image file as shown in Figure 5–7). 5–9 Sample Sessions with TextBridge...
Page 93
Figure 5–7. TextBridge - Getting Page dialog box TextBridge then automatically zones the page and identifies text, tables, and pictures as shown in the Zoning dialog box (Figure 5–8). Figure 5–8. TextBridge - Zoning dialog box 5–10 TextBridge Pro Millennium User’s Guide...
Page 94
TextBridge automatically recognizes the characters and page layout as shown in the Recognizing dialog box (Figure 5–9). Figure 5–9. TextBridge - Recognizing dialog box After TextBridge reads the page image and processes it, it asks you to save the document (Figure 5–10). Accept the default name, or type a new name Click Save...
Page 95
6. In the Save As dialog box, complete the following steps: In the Save in list, select the folder in which to save the text file. Be sure to notice where the document is saved so that you can find it easily. In the File name box, type a file name.
Page 96
7. Compare the recognized document in your word processor with the picture of the sample document, letter.tif (Figure 5–11). Figure 5–11. Letter sample document With a word processor such as Word or WordPerfect in the print or page layout view, the recognized document should have the same or similar layout as the TIFF image or sample document.
2: U ESSION SING NSTANT CCESS TO RIDGE You can use TextBridge Instant Access to run TextBridge from within another application, such as a word processor. To use Instant Access to TextBridge, simply start TextBridge from within an application, such as Word or WordPerfect. During Instant Access, TextBridge processes a document then pastes it into the open document in your text application.
Page 98
If TextBridge is still running from the previous learning session, exit from TextBridge. You can have more than one copy of TextBridge running at the same time, but it is not recommended. Before you run Instant Access to TextBridge, you may need to use the Instant Access Control Panel (Figure 5–12) to choose which applications have Instant Access.
Page 99
The Enable access to TextBridge list shows the text applications from which TextBridge can be invoked. The list includes applications commonly used with TextBridge and applications that are currently running. If your application does not appear in this list, close the TextBridge Instant Access Control Panel, start your application, and reopen the TextBridge Instant Access Control Panel.
Page 100
Start Instant Access to TextBridge Figure 5–13. TextBridge... command in File menu The TextBridge Instant Access dialog box appears (Figure 5–14). Notice that the Instant Access dialog box looks similar to the Page Type dialog box in the standalone version of TextBridge. Auto OCR and Manual buttons have been added, as well as choices for Page Source and Output.
Page 101
3. In the TextBridge Instant Access dialog box: In the Page Type box, click Letter. Using Letter instead of the default Any Page (b&w) is a refinement of the settings. In using Letter, you are telling TextBridge that the page is single-column and the print is good enough for black and white scanning, which is faster.
Page 102
4. In the Get Pages dialog box, double-click the sample document, letter.tif. TextBridge reads the image file, and automatically performs OCR on it, as indicated by the progress dialog boxes. After acquiring and recognizing the page, TextBridge pastes the recognized document into the open document in your word processor.
3: R ESSION ECOGNIZING A OMPLEX OCUMENT SING ANUAL ROCESSING For more complex documents such as magazine articles, you often can use TextBridge in automatic mode. However, simply using a few additional steps in manual mode can sometimes produce a more accurate result in less time.
Page 104
When you select Magazine (color) as the page type, it automatically specifies the following settings: Multi-column page layout Good print type Portrait orientation For scanning, Magazine (color) page type specifies: Letter page size Color picture output Run the standalone version of TextBridge from the Start button for this learning session.
Page 105
4. Click the Get Pages button. The Get Pages dialog box appears (Figure 5–17). Select complex.xif Figure 5–17. Get Pages dialog box with complex.xif selected 5. Double click complex.xif. TextBridge gets the page, and displays it in the Image view. The page you see should be a four-column magazine article beginning with a title and piechart.
Page 106
6. Click the Find Zones button. TextBridge automatically zones the page. TextBridge locates areas on the page to recognize and designates each area as text, table, or picture. TextBridge then stops for you to check and change the zones if necessary (Figure 5–18). Preview and zoning tools Page thumbnail Text zones...
Page 107
7. Check the results of automatic zoning. There should be text zones, a locked picture zone, and a table zone. Click the Zoom In and Zoom Out buttons to enlarge and reduce the page to examine the zones, if necessary. Zoom In Zoom Out Modify automatic zoning, if necessary.
Page 108
Erase the area of the zone that connects the regular text to the reversed video text. Press and hold the left mouse button at the upper left corner of the area you want to erase. Drag the mouse diagonally across the area to erase. When you have defined the area, release the mouse button.
Page 109
Proofreading tools Word Image window Suspect word Figure 5–19. Proofreading a page 9. Change any words that were not accurately recognized using the Proofreading tools. Examine the word in the Suspect word box. If you want a closer look at the word as it appears in the original page, look in the Word Image window, or display the word image popup by moving the cursor over the highlighted word on the page.
Page 110
If the suspect word is not the word you want, type the word you want in the Suspect box. The Suspect box drop down contains alternative suggestions for the suspect word. Click on the suggestion to change to that word. Click the Add to Dictionary button if you want the TextBridge dictionary to store a word for recognition of subsequent documents.
Page 111
11. Save the page as Magazine. TextBridge provides a suggestion for the file name and uses the type of file you selected last, automatically appending the appropriate extension. Rich Text Format (RTF) supports recomposition and is compatible with most word processing applications.
The page is like the original page, including the original layout. The document is a fully editable version of Complex in your word processor. If retain layout is not selected, or if your text application does not Note support retain page layout, the page will be a single column of text, also referred to as galley text, followed by pictures.
Page 113
For this example, use the sample document named scanning.tif. This document has a title heading, text with headings, a greyscale graphic, line art, reverse video text, and a multiple-column cell table. In this session you’ll learn to: Compare page types to decide which to select. Modify a page type.
Page 114
To process text, pictures, and a table: 1. Select the page source. Click the drop down arrow on the Get Pages button to select Image File. 2. Select the page type. Click the Page Type button and select Table. You may need to scroll to see the icon for the Table page type. 3.
Page 115
Figure 5–21. Original Page tab in the Page Type Settings dialog box with Multi-column selected 5. In the Page Layout area of the Original Page tab in the Page Type Settings dialog box, select Multi-column. The settings are now set to multi-column instead of table text plus the original settings of any orientation and good print type.
Page 116
7. Click OK to close the Page Type dialog box. 8. Click the Get Pages button. The Get Pages dialog box appears. 9. Double click scanning.tif in the Get Pages dialog box. TextBridge gets the page, and displays it in the Image view where you can preview it.
Page 117
Find zones Manual zoning tools Highlighted zones Figure 5–22. Page with text, picture, and table zones 11. Check the results of automatic zoning. There should be two picture zones, several text zones, and one table zone. Check that the entire table is included in one table zone.
Page 118
If you need to resize a zone: Draw more with the zoning tools, or erase parts of the zones with the erase tool. Use the erase tool to separate the page title from the first paragraph. 12. Click Recognize. TextBridge recognizes the page, then stops for you to proofread the text (Figure 5–23).
Page 119
13. Change the recognition confidence level. The default confidence level is Show Suspect Words. If you change the confidence level to Show Highly Suspect Words, TextBridge will raise its confidence level, and fewer words will appear as suspects. If you change the confidence level to Show Somewhat Suspect words, TextBridge will lower its confidence level, and more words will appear as suspects.
Page 120
16. Save the page as Scanning.rtf. Be sure to select Retain pictures and Retain layout. TextBridge formats and saves the document. 17. Open Scanning.rtf in your word processor. Figure 5–24. Scanning sample document The page is like the original page with the original layout including the pictures and table.
Page 121
19. Reset the Table page type in TextBridge. Click the Page Type button. Highlight Table. Click the Settings button. Click the Reset button. The original settings for the Table page type will be restored. HERE TO The learning sessions in this chapter were designed to give you a solid basis on which to use TextBridge for your own documents.
DVANCED AMPLE ESSIONS Previous chapters have introduced you to basic TextBridge capabilities. This chapter provides sample sessions with step-by-step instructions for using several more advanced TextBridge functions. The topics presented in this chapter are in the following list: Processing a document to use in a database Using zone templates and page types Training TextBridge OCR This chapter uses the same sample documents described in...
Page 123
For this learning session, use the image file named table.bmp. This image file has a heading followed by a table in cell format with gridlines containing dates, names, and telephone numbers. To process this document for use in a database: 1.
Page 124
The default folder for image files is is Note C:\My Documents\TextBridge\Image Files However, unless you installed TextBridge in another directory, sample image files are installed in C:\Program Files\TextBridge Pro Millennium\Image Files\Samples If Samples is not the open folder, access the sample documents folder from the Look In: box in the Get Pages dialog box.
Page 125
Click the Select button Table zoned with cell borders Figure 6–2. Zoned table.bmp in Image View 5. Click the Select Zone button on the toolbar and double- click on the table. The table editing tools replace the zoning tools (Figure 6–3). Draw hidden table cell border Merge table cells Draw visible table cell border...
Page 126
7. Click the Recognize button. TextBridge recognizes the pages and displays it in the Text view, where you can proofread and correct any poorly recognized words (Figure 6–4). Figure 6–4. Table in Text view 8. Click the Save As button. The Save As dialog box appears (Figure 6–5, next page).
Page 127
Accept the default name, or type a new name Click Save Select Text tab-delimited output format Deselect Open file when done Figure 6–5. Save As dialog box 9. Save the document in text tab-delimited format. In the Save As dialog box, TextBridge provides a suggestion for the file name.
2: U ESSION SING EMPLATES AND YPES TextBridge provides zone templates as the means to repeatedly process or ignore specific areas on the same type of pages, and save time without rezoning each page. After you create a set of zones, TextBridge lets you save the current set of zones (including their size, location, and type) as a zone template.
Page 129
2. Click Get Pages. The Get Pages dialog box appears. 3. Double click Scanning.tif in the Get Pages dialog box. TextBridge gets the page, and displays it in the Image view where you can create a zone template. The page you see should be titled “Scanning Industry is Booming.”...
Page 130
Figure 6–6. Page with text, picture, and table zones 5. Save the zone template. In the Tools menu, select Save Zone Template. The Save Zone Template dialog box appears (Figure 6–7, next page). 6–9 Advanced Sample Sessions...
Page 131
Specify the default location Specify the file name Save the template Figure 6–7. Save Zone Template dialog box Select the default location to save the zone template file. To specify your zone template in Page Type settings, you must save the template in the default folder, Zone Templates. However, if you save the zone template to another location, you can still load it using the Load Zone template command available from the Tools menu.
Page 132
Click to create a new page type Figure 6–8. Page Type Settings–Magazine (b&w) dialog box 7. Create a new page type. In the Page Type Settings dialog box, click New to open the New Page Type dialog box (Figure 6–9). Type the new name Enter a description Figure 6–9.
Page 133
Click OK to close the New Page Type dialog box and return to the Page Type Settings dialog box (Figure 6–8). Click OK to close the Page Type Settings dialog box. 8. Select the new page type. Select My Newsletter in the Page Type dialog box. Click Settings to open the Page Type Setting dialog box (Figure 6–10).
Page 134
10. Begin a new document. You are now ready to process the next month’s Scanning News with your page type and zone template. Select the New command from the File menu. TextBridge warns you that you have not saved the current pages.
3: T ESSION RAINING RIDGE To assure the highest possible accuracy, TextBridge provides an interactive training capability. This feature enables you to participate in the OCR process and train TextBridge by verifying correctly recognized words and correcting recognition errors. With training, TextBridge achieves higher accuracy for this specific page and any other pages like it.
Page 136
2. Enable training. Click the drop down arrow on the Recognize button and select Enable Training (Figure 6–11). Click the Recognize drop down arrow Select Enable Training Figure 6–11. Enable training This “sticky” setting remains in place for all subsequent documents until you disable training.
Page 137
4. In the Get Pages dialog box, double click fax.pcx. TextBridge opens the page and begins recognition. When TextBridge is unsure of a word, it stops to enable you to train OCR. The Training dialog box appears (Figure 6–13). Click when the word is correct Click when you are done training Suspect word...
Page 138
Sometimes TextBridge recognizes stray marks, handwritten notes, or dirt on the original page as characters. If the word image is not a word, click Not a Word. TextBridge continues on to the next word. To undo your last action, click the Undo button. For purposes of this session, repeat this process until you have trained OCR on at least a few words.
Page 139
7. In the Save Training Data dialog box: • Save training data in the Training Data folder. • Enter a file name. Save the file with a .trn extension. • Click the Save button. The Save Training Data dialog box closes, and the Save As dialog box opens (Figure 6–15).
Page 140
9. View the file in your word processor. Figure 6–16. Fax sample document Notice that, even though the input document was a low-quality fax image, TextBridge recognized it with a high degree of character recognition and formatting accuracy. You can use the saved training data to improve the recognition of documents of similar quality and with the same fonts.
HERE TO The learning sessions in this chapter were designed to give you a solid basis on which to use TextBridge for your own documents. For more information about TextBridge, please refer to the Help. 6–20 TextBridge Pro Millennium User’s Guide...
Page 142
NDEX Accept button, 6–16 Accepting a suspect word, 5–26 Adding a word to the dictionary, 5–27 Adobe Acrobat Reader, viii Any Page page types, 3–2, 5–7 Application formats supported, 1–7 Applications supporting recomposition, 1–7 Assistant, 1–5, 4–21 Automatic processing, 4–5, 5–7 Automatic zoning, 5–33, 6–8 Autorun program, 2–9 Basic operations, 4–9...
Page 143
Database documents, 5–1 Default folder for image files, 5–5, 6–3 Deferred processing, 1–9 De-installing a previous version of TextBridge, 2–7 De-installing TextBridge Pro Millennium, 2–13 Dialog boxes Get Pages, 5–9, 5–19, 5–22, 5–33 Getting Page, 5–10 Instant Access control panel, 5–15 Instant Access to TextBridge, 5–17 New Page Type, 6–11 Open, 5–4...
Page 144
Fax documents, 1–10 Fax page type, 3–2 Find Zones button, 5–33 Foreign language recognition, 1–10 Formats supported, 1–7 Formatting with paragraph styles, 3–5 Forms, 4–14 Get Pages button, 5–4 Get Pages dialog box, 5–4 Getting Page dialog box, 5–10 Grayscale images, 1–4, 1–11, 3–4 Grid lines, 5–29, 6–3 Help system, x, 4–20 HTML output, 1–9...
Page 145
Language and Zones, Tables, and Cells, 3–18 Language installation, 1–2, 2–5, 3–16 Language recognition, 1–5, 1–10, 3–15 Learning sessions, 5–1, 6–1 Legal page type, 3–2 Letter page type settings, 3–2, 5–14 Live updates to TextBridge Pro, 1–4, 2–12, 4–25 Location of sample image files, 5–5, 6–3 Magazine (b&w) and (color) page type, 3–2, 5–21 Manual processing, 4–8, 5–20 Manual zoning, 1–9...
Page 148
Scanners, 1–3, 1–8 ScanSoft on the Web, xi, 2–12, 4–25 Selecting page type, 4–11 Selecting page source, 4–10 Serial number, xii Setup program, 2–9 Show Me How window, 4–21 Software registration card, 1–2 Software serial number, xii Software version number, xii Spreadsheet recomposition, 1–6 Standalone application, 3–6 Starting a new document, 6–13...
Page 149
TextBridge about the user’s guide, vii adding more pages, 4–6 assistant, 1–5, 4–21 automatic processing, 4–5, 5–7 basic operations, 4–9 CD-ROM, 2–8 custom dictionary, 1–8 database documents, 5–1 de-installation of a previous version, 2–7, 2–13 deferred processing, 1–9 disk space requirements, 2–5 dynamic training, 1–8 files from older versions, 2–7 Help system, 4–20, 4–23...
Page 150
TextBridge (cont.) sample documents, 5–2 scanner installation and setup, 2–4, 2–10 scanners supported, 1–3, 1–8 setup program, 2–9 starting, 4–3 Technical Support for, xi text view, 5–26, 6–5 tips, 4–22 tutorials for using, 5–1, 6–1 two-sided documents, 1–9, 4–6 types of documents it can OCR, 1–10 uninstalling, 2–6, 2–13 user assistance, 4–20 Web site, 1–3, 4–25...
Page 151
Version number, xii Visioneer sheetfed scanner, 2–3 Ways You Can Use TextBridge, 4–2 Web site, 4–25 Welcome window, 4–20 What’s This? Help, 1–6 Windows, 1–4, 2–5 Word Image window, 5–26 Xerox PARC, 1–10 Zone order, 4–15 Zone templates, 1–9, 6–7 files from older versions of TextBridge, 2–7 saving, 6–9 Zones, 1–9, 4–14, 5–23...
Need help?
Do you have a question about the TEXTBRIDGE PRO-MILLENIUM and is the answer not in the manual?
Questions and answers