Archive Builders: (310) 937-7000 SteveGilheany@ArchiveBuilders.com


Articles With
More Information
Archive Builders
Home Page
Available and
Past Courses
Upcoming and
Past Presentations

Computer Storage Requirements for Various Digitized Document Types


Scanned Letter Size Pages

1 scanned page (8 1/2 by 11 inches, A4) = 50 KiloBytes (KByte) (on average, black & white, CCITT G4 compressed)

1 file cabinet (4 drawer) (10,000 pages on average) = 500 MegaBytes (MByte) = 1 CD (ROM or WORM)

2 file cabinets = 1,000 MBytes = 1 GigaByte (GByte); 10 file cabinets = 1 DVD (WORM) (see below)

2,000 file cabinets = 1,000 GigaBytes = 1 TeraByte (TByte); 2,000 file cabinets = 200 DVDs

1 box (in inches: 12 wide x 15 long x 9.5 deep) (2,500 pages) = 1 file drawer = 2 linear feet of files = 125 MBytes

8 boxes = 16 linear feet = 2 file cabinets = 1 GByte; 8,000 boxes = 16,000 linear feet = 1,000 GBytes = 1 TByte

Scanned Engineering Drawings / Large Format Documents

1 E size drawing (48 inches by 36 inches) = 16 letter size pages (8 ½ by 11 inches, A4) = 800 KBytes

D size = 8 letter size pages; C size = 4 pages; B size = 2 pages; A size = 1 page; new E size = 44 in. x 34 in. (Scanners have to accommodate the old E size of 48 in. x 36 in.), (A0 size is the ISO metric size equivalent nomenclature for E size), D size (metric A1 size) = 34 in x 22 in (old D size = 36 in x 24 in), C size (A2) = 22 x 17 (24 x 18), B size (A3) = 11 x 17 (12 x 18), A size (A4) = 8 ½ x 11 (9 x 12) [105 mm microfiche is the metric A6 size]; F size = 28 x 40, Roll sizes: G size = 11 x 22 ½ to 11 x 90, H size = 28 x 44 to 28 x 143, J size = 34 x 56 to 34 x 176, K size = 40 x 56 to 40 x 143; Newspapers: A double truck (center fold) full broadsheet is 24 in x 36 in, equivalent to an old D size drawing.

Scanned Microforms

1 roll of 16 mm microfilm (100 ft) = 2,500 letter size images = 1 box = 1 file cabinet drawer = 125 Mbytes

1 roll of 35 mm microfilm (100 ft) = 5,000 letter size images (or letter size image equivalents) = 250 Mbytes

1 microfiche (105 mm film) = 100 letter size images = 5 MBytes (average); 200 fiche = 20,000 images = 1 GByte

In many record series, each microfiche contain only a few images because each fiche represents a single record in the series (e.g. one fiche per person in a personnel record series). In this case filming breaks on records, rather than being continuous. To a lesser extent this is also true for roll film. In these cases, the amount of storage required depends on the number of images on the film, not the number of microfiche or the number of rolls of film.

Scanned aperture card images require the same storage as the document or drawing in the aperture.

Scanned Miscellaneous Documents

1 check (2 sided) (remittance) = 50 KBytes per item, 25 KBytes (1 sided), less if no patterns are present

1 credit card receipt (long: 3.5 x 6.5 inches, 2 sided) (remittance) = 35 KBytes, (short: 3.5 x 4.75 in., 2 sided) = 25 KBytes

1 library book (average, scanned in black and white) = 10 MBytes; 50 books = 500 MBytes = 1 CD; 100 books = 1 GByte

Digitized Multimedia Formats

1 hour of compressed color video = 2 GigaBytes (DVD, MPEG 2) (image quality dependent)

1 hour of audio = 10 MBytes (dictation, answering machine) to 500 MBytes (a CD holds 74 minutes of music)

1 color picture = 10 KBytes (thumbnail) to 5 MBytes (for each of 100 photos on a 500 MByte photo CD)

The size of compressed file depends on the resolution (DPI: Dots Per Inch) and the detail (information) in the photograph. The detail in a photograph is dependent on the size of the negative and the quality of the film and the camera and lens (It is not related to the print size unless the print is smaller than the negative). The resolution of the scan should be chosen to match the detail of the photograph. For most cameras, films, and formats 35 mm and smaller, the 5 MByte Photo CD format (3072 by 2048 pixels) captures all the information in the image. Note that this is in dots per image rather than dots per inch. Displays are also given in dots per image (H x V: 1024x768).

Medical Records

1 Chest X-ray (14 x 17 inches) = 1 MegaByte: 150 DPI (Dots Per Inch), 12 bits (compressed)

12 bits per pixel provide 4,096 shades of gray. (Wavlet compression, lossless mode, has FDA 510(k) approval.) // (150 DPI, 12 bit images are recommended by the American College of Radiology for primary reads.) // 14 x 17 Chest X-ray = 200 KiloBytes (For secondary reads: wavlet compression, lossy mode, has FDA 510(k) approval.)

Units of Measure (Digital)

1 Byte (B) (Common usage) = 8 bits (b) = 1 character (Byte & bit are best spelled out.); 1 Unicode Byte = 16 bits = 1 character

1,000 Bytes = 1 KiloByte (exactly 1 Thousand in common and legal usage) (exactly 1,024 Bytes = 2**10 = 2 to the 10th power in computer terms); 1,000 KBytes = 1 MegaByte (exactly 1 Million in common and legal usage) (exactly 1,024 KBytes = 1,048,576 Bytes = 2**20 = 2 to the 20th power in computer terms); (Due to lawsuits in recent years only the legal terms can be used commercially.) 1,000 MBytes = 1 GigaByte (Billion); 1,000 GBytes = 1 TeraByte (Trillion); 1,000 TBytes = 1 PetaByte (Quadrillion); 1,000 PBytes = 1 ExaByte (Quintillion); 1,000 EBytes = 1 ZettaByte (Sextillion); 1,000 ZBytes = 1 YottaByte (YByte) (Septillion).

1 millisecond (ms) = 1/1,000 second; 1 microsecond (us) (u is substituted for the Greek letter mu) = 1/1,000 ms, 1 nanosecond (ns) = 1/1,000 us; 1 picosecond (ps) = 1/1,000 ns; 1 femtosecond (fs)= 1/1,000 ps; 1 attosecond (as) = 1/1,000 fs; 1 zeptosecond (zs) = 1/1,000 as; 1 yoktosecond (ys) = 1/1,000 zs

1 Hertz = 1 cycle per second (e.g. 1 clock cycle in a computer which corresponds roughly to 1 instruction execution.). A 1,000 cycle per second signal or action is called a 1 KiloHertz signal or action (a 1 KHz signal), each cycle of such a signal is a millisecond long (KHz:ms:10** +&- 3) 1,000 KHz = 1 MegaHertz (KHz:ms:3) (MHz:us:6) (GHz:ns:9) (THz:ps:12) (PHz:fs:15) (EHz:as:18) (ZHz:zs:21) (YHz:ys:24) Because light travels about 300 MegaMeters (MM) in 1 second and has a wavelength of about 400 nM for blue light (about 700 nM for red light), the frequency of light is about 750 THz for blue light (about 430 THz for red light). This is because speed (e.g.: C, the speed of light, which is a constant) = wavelength X frequency.

Pages per Second (Communications)

Modem = 56 Kbit per second = 3 pages per minute (about ~ US$30.00 per month for a standard phone line) ISDN (2 voice channels) = 128 Kbit per second = 10 pages per minute (~ US$100.00 per month) (ISDN charge) Cable (TV) modem =~ 500 Kbits per second = 1 page per second (about ~ US$50.00 per month) T1 (24 voice channels) = 1.544 Mbit (Megabit) per second = 3 pages per second (~ US$1,000.00 per month) Ethernet (CSMA/CD) = 1 Mbit per second (effective) or 10 Mbit per second (nominal) = 2 pages per second OC3 ATM (Optical Carrier, Asynchronous Transfer Mode) = 155 Mbit per second = 300 pages per second OC192 (SONET: Synchronous Optical NETwork fiber) = 10 Gbit / second = 20,000 pages (2 file cabinets) / sec. Dense Wavelength Division Multiplexing (DWDM) with OC192 = 320 Gigabits / second = 64 file cabinets / sec. Optical carrier frequency (1,300 nm) = 230 THz (about 20,000 cycles used for every OC192 bit transmitted)

DVD Digital Video Disc

1 DVD (commonly Digital Video Disc) (same physical size as a CD ROM) = 7.9 GByte (WORM) (10 file cabinets)

DVD WORM: (Write Once, Read Many) (2 sided, 1 layer per side) 7.9 GByte (3.95 GBytes per side) DVD RW: (overwrite, ReWrite) (2 sided, 1 layer per side) 5.2 GByte; DVD ROM (Read Only Memory) (2 sided, 2 layers / side) 17 GBytes. Multimedia: 5 channel (theater quality surround sound) (5.1, Dolby AC-3) / 96 KHz audio / 24 bit audio, 8 languages tracks, 32 subtitle tracks, and about 135 minutes (long enough to accommodate 94% of all movies) of high quality video (720 horizontal lines) on each of 4 layers. DVDs support runtime editing so that all ratings of a movie are on the same DVD; 'R' rated scenes can be skipped as the DVD is played. The file format is ISO 13346 UDF (Universal Disc Format) which harmonizes all CD recording standards including, ISO 9660. A future technology, 3rd generation blue lasers [sort of a blue light special, blue light has a wavelength about half that of red light.], should yield a 40 GigaByte DVD ROM for HDTV.

Paper, Trees, COLD, and Scanning

1 pulp tree (loblolly pine) = 1/10th cord of wood = 10,000 pages = 1 File Cabinet = 4 boxes = 1/2 GigaByte = 1 CD

1 lumber tree (20 inch diameter, 110 ft tall, 50 years old) = 1 cord = 10 pulp trees (8 in. dia., 50 ft tall, 20 yrs old) = 1 cord 1 cord = 4 ft x 4 ft x 8 ft = 128 cubic feet as stacked for storage (75 cubic feet of wood) = 100,000 pages = 5 GigaBytes

1 wordprocessor or OCR'ed (Optical Character Recognition) page = 5 KBytes (all pages listed above are scanned pages)

1 compressed page of COLD (Computer Output to Laser Disc) or COOL (Computer Output On-Line) (including index) = 2 KBytes for letter size statements, 4 KBytes for 11 x 14 inch fanfolded greenbar computer sheet, 10 KBytes for All Points Addressable (APA) pages such as IBM AFP (Advanced Function Printing) and Xerox Metafont.

Minimum commercial scanning cost for backfile conversion (more than 1 million pages) = about ~ 5 US cents per page [Article 009v60]


Updates and More Detailed Descriptions

When using the information in this article, please check the website http://www.ArchiveBuilders.com for updates. The version number for this article is located at the end of the article and in the Note to Editors section below. The website also has articles that provide more details on some of the terms and concepts in this article.

Comments

Please let us know how you like this paper, or if you had any questions. What would you like to see in the future? For more, and the most recent version of this article, please visit our web site at http://www.ArchiveBuilders.com. We also have the articles in Microsoft Word format which prints on far fewer pages than the HTML version. Also, please let us know where you saw this article.

Please send your comments via email to:
SteveGilheany@ArchiveBuilders.com
Tel: +1 (310) 937-7000. Fax: +1 (310) 937-7001.

Acknowledgements

Reprinted from Archive Planning, Volume 2, number 7, 1998, Archive Builders' analysis newsletter for document management.

See http://www.ArchiveBuilders.com.

All trademarks are the property of their respective holders.

Note to Editors

Article 009v60

We will continue to update these articles as we get comments. Please contact us for the most current version before you publish and please request permission to publish the article. Permission will be given freely for most purposes. Also, please send us a copy of the publication when you publish the article. The articles are also available in a Microsoft Word format that can be printed on many fewer pages than the HTML format.

Steve Gilheany
Archive Builders
1147 Manhattan Avenue, Suite 322
Manhattan Beach, CA 90266
Tel: +1 (310) 937-7000 Fax: +1 (310) 937-7001
SteveGilheany@ArchiveBuilders.com

Bio

Steve Gilheany, BA in Computer Science, MBA, MLS Specialization in Information Science, CDIA (Certified Document Imaging System Architect), AIIM Master, and AIIM Laureate, of Information Technologies, CRM (Certified Records Manager, ARMA) has seventeen years experience in document imaging and is a Sr. Systems Engineer at Archive Builders.

Author

Steve Gilheany is a Sr. Systems Engineer at Archive Builders. He has worked in digital document management and document imaging for seventeen years.

His experience in the application of document management and document imaging in industry includes: aerospace, banking, manufacturing, natural resources, petroleum refining, transportation, energy, federal, state, and local government, civil engineering, utilities, entertainment, commercial records centers, archives, non-profit development, education, and administrative, engineering, production, legal, and medical records management. At the same time, he has worked in product management for hypertext, for windows based user interface systems, for computer displays, for engineering drawing, letter size, microform, and color scanning, and for xerographic, photographic, newspaper, engineering drawing, and color printing.

In addition, he has nine years of experience in data center operations and database and computer communications systems design, programming, testing, and software configuration management. He has an MLS Specialization in Information Science and an MBA with a concentration in Computer and Information Systems from UCLA, a California Adult Education teaching credential, and a BA in Computer Science from the University of Wisconsin at Madison. His industry certifications include: the CDIA (Certified Document Imaging System Architect) and the AIIM Master, and AIIM Laureate, of Information Technologies (from AIIM International, the Association of Information and Image Management, (http://www.AIIM.org), and the CRM (Certified Records Manager) (from the ICRM, the Institute of Certified Records Managers, an affiliate of ARMA International, the Association of Records Managers and Administrators, (http://www.ARMA.org).

Contact:

SteveGilheany@ArchiveBuilders.com
Tel: +1 (310) 937-7000
Fax: +1 (310) 937-7001

For more information, courses, and papers:
http://www.ArchiveBuilders.com


Articles With
More Information
Archive Builders
Home Page
Available and
Past Courses
Upcoming and
Past Presentations