This is version 22021v028 of this document. Click here to access the most recent version of this document.

How Digitizing Works


INTRODUCTION

The following diagram illustrates the way in which a document is digitized. The second illustration includes an illustration of dynamic thresholding.

The Light Bulb, CCD, Photon, and Electron

Light strikes the moving paper.

A (representative) row of square pixels is illuminated. (The row of pixels only shows 4 squares. In fact, when an 8 1/2 by 11 inch (A4 size) sheet of paper is scanned at 300 dpi (dots per inch) the row of pixels contains 2550 pixels = 8 ˝ inches times 300 dpi. The paper moves to a new location, and the 2550 pixels are scanned, 3300 times, to scan the 11 inches times 300 dpi in the direction of paper movement.)

The light reflected from one (representative) pixel passes through a lens.

Then the light passes through a Red, Green, or Blue, filter, or does not pass through a filter.

Each photon of light knocks an electron loose.

The loose electrons fall through a diode.

The diode traps the electron on a capacitor.

The charge stored in each capacitor (one behind each photoconducting sensor) hops down the CCD (Charge Coupled Device).

At the end of the charge coupled device there is an analog to digital (A to D) converter.

Deciding if the Pixel is a One (White) or a Zero (Black)

The A to D converter decides whether the amount of light reflected should be counted as black or white, this information is recorded as a 1 or a zero.

Dynamic Threshold

If the background of a page is stained, it counts as black. To fix this, the threshold between black and white is lowered. This causes the darker background to be counted as white, eliminating the muddy effect usually created by stains.

If the scanner decides when to lower the threshold (because there is a stain) then it is called dynamic thresholding.

Grayscale, 24 Bit Color, and 16 Bit X-Rays

If white is divided into light-white and dark-white, and black is divided into light-black and dark-black, then two bits can be used to record four gray levels.

And so on until 8 bits are used to record 256 different gray levels.

If filters were used, then there are 8 bits for each of the three colors (Red, Green, and Blue) producing 24 bit color.

The number of values that correspond to using 1 through 16 bits to record gray levels are the numbers that are very common in computing.

16 bits is used to record the 65,536 shades of gray need to distinguish cancer tumors from surrounding tissue in X-rays.

ASCII Digitizing the Alphabet

Writing the Alphabet into the grayscale divisions, we have digitized the alphabet and created ASCII (the ANSI (American National Standards Institute) Standard Code for Information Interchange). A capital ‘A’ in ASCII is recorded as ‘41’ in hexadecimal which is ‘0100 0001’ in binary, but it is not necessary to learn to do arithmetic in octal or hexadecimal to see this. It can be shown graphically.

Music, Audio, and other Sounds

For music, you merely record the position of a person’s eardrum 44 thousand times per second to an accuracy of 1 in 65,536 using 16 bits or two bytes (in common usage).

Rounding to 50 thousand sample times per second and two samples per sample time (for stereo) and two bytes per sample, this produces 200 thousand bytes per second.

200 thousand bytes per second times 3,600 samples per hour (60 minutes times 60 seconds per minute) this produces 720 million bytes per hour.

720 million bytes per hour is an estimate that compares favorably with the 650 million bytes that are said to be on a CD that holds 74 minutes of music.

22021v028_0.gif

Dynamic thresholding added.

22021v028_1.gif

Note to Readers

Updates and More Detailed Descriptions

When using the information in this article, please check the website http://www.ArchiveBuilders.com for updates. The version number of this article is just before the page number below. The website also has articles that provide more details on some of the terms and concepts in this article.

Comments

Please let us know how you like this paper, or if you had any questions. What would you like to see in the future? Also, please let us know where you saw this paper. For more, and the most recent version of this article, please visit our web site at http://www.ArchiveBuilders.com Please send your comments via email to SteveGilheany@ArchiveBuilders.com. Tel: +1 310-937-7000. Fax: +1 310- 937-7001. Also, please let us know where you saw this article.

Acknowledgements

Reprinted from Archive Planning, Volume 4, number 3, 2000, Archive Builders' analysis newsletter for document management. See http://www.ArchiveBuilders.com. All trademarks are the property of their respective holders.

Note to Editors

Paper 22021v028

We will continue to update these articles as we get comments. Please contact us for the most current version before you publish. Also, please request permission to publish the article. Permission will be given freely for most purposes.

Steve Gilheany
Archive Builders
1209 Manhattan Ave., PMB C-14
Manhattan Beach, CA 90266
Tel: +1 310-937-7000 Fax: +1 310-937-7001
SteveGilheany@ArchiveBuilders.com

Dividing this Article into Parts for Serialization

If you decide to divide this article into parts please print at least the updates, comments, and acknowledgements sections in each of the parts along with: ‘by SteveGilheany@ArchiveBuilders.com’.

Bio

Steve Gilheany, BA in Computer Science, MBA, MLS Specialization in Information Science, CDIA (Certified Document Imaging System Architect), AIIM Master (MIT), and AIIM Laureate (LIT), of Information Technologies, CRM (Certified Records Manager, ARMA) has nineteen years experience in document imaging and is a Sr. Systems Engineer at Archive Builders.

Author

Steve Gilheany is a Sr. Systems Engineer at Archive Builders. He has worked in digital document management and document imaging for nineteen years.

His experience in the application of document management and document imaging in industry includes: aerospace, banking, manufacturing, natural resources, petroleum refining, transportation, energy, federal, state, and local government, civil engineering, utilities, entertainment, commercial records centers, archives, non-profit development, education, and administrative, engineering, production, legal, and medical records management. At the same time, he has worked in product management for hypertext, for windows based user interface systems, for computer displays, for engineering drawing, letter size, microform, and color scanning, and for xerographic, photographic, newspaper, engineering drawing, and color printing.

In addition, he has nine years of experience in data center operations and database and computer communications systems design, programming, testing, and software configuration management. He has an MLS Specialization in Information Science and an MBA with a concentration in Computer and Information Systems from UCLA, a California Adult Education teaching credential, and a BA in Computer Science from the University of Wisconsin at Madison. His industry certifications include: the CDIA (Certified Document Imaging System Architect) and the AIIM Master (MIT), and AIIM Laureate (LIT), of Information Technologies (from AIIM International, the Association of Information and Image Management, http://www.AIIM.org, and the CRM (Certified Records Manager) (from the ICRM, the Institute of Certified Records Managers, an affiliate of ARMA International, the Association of Records Managers and Administrators, http://www.ARMA.org.

Contact:

SteveGilheany@ArchiveBuilders.com Tel: +1 310-937-7000 Fax: +1 310-937-7001

For more information, courses, and papers:

http://www.ArchiveBuilders.com