Detail map of Boston, Massachusetts, United States

Overview map of Boston, Massachusetts, United States

Calvin Mooers Imagines the "DOKEN" (Documentary Engine), a Hypothetical Machine to Search the World's Literature

1951

P. 2 of Mooer's report On Making Information Retrieval Pay

In 1951 Calvin N. Mooers, whose Zator Company was the first information retrieval company, and who had coined the term "information retrieval" in 1950, issued a report entitled Making Information Retrieval Pay, Zator Technical Bulletin 55.

From this report I quote:

IX - The DOKEN

"Can the world-wide torrent of scientific information--from an estimated 30,000 periodicals containing an estimated 1,000,000 papers per annum--be met by any conceivable retrieval machine? The answer is yes, and the back-log (estimated roughly at 100,000,000 pieces) can be handled too.

"No existing machine is capable of doing a reasonable job of information retrieval on such a collection. The fastest electronic tabulating machinery would seem to require about 2,600 hours, or about 3 1/2 months, to scan a collection of 100 million pieces in answer to one request for information. The Microfilm Rapid Selector, according to published speeds, would take about 170 hours of steady running time or about a week to make the same search. Both these are too slow to meet a reasonable requirement that a central agency having such a machine should be able to make a number of searches each day, and to send out the bibliographies the same day the request was received.

"A machine that can do this job is actually possible--and it can be constructed within the limitations of our present technology. I will describe some of the features of such a machine in order that you will know what such a machine will be like when it is built. On the other hand, I can't tell you the date that his machine will actually be constructed because I cannot forecast when anyone will be able to afford it. The great expense is not in the machine. The machine will cost less than one of the enormous computing machines that we have been hearing so much about, and which some organizations seem to be able to afford. The real cost is handling and analyzing the magnitude of information in setting up the system. We should figure on a cost of at least $2 per item. Thus the annual cost of processing the world's information--$2,000,000--would be several times the cost of the machine itself. But, to get back to the details of our hypothetical machine:

"We will call the machine the D O K E N, which is short for "documentary engine". The DOKEN is capable of making a complete multi-subject search of 100 million items in about 2 minutes, and having scanned the record, it reproduces or prints a bibliography of the selected abstracts at a rate of about 10 per minute by a dry printing process. Many searches are conducted each hour, steadily, throughout the day. After the first DOKEN is operating, film records for other DOKENS can be inexpensively copied at a fraction of the original cost. A DOKEN is a most appropriate instruction for national or regional research centers. It would be the information retrieval auxiliary instrument at a large library center for the local collection plus the entire world's literature. For instance, it could scan the Library of Congress collection (10 million catalogued items) in 10 seconds.

"The DOKEN can achieve the stated performance goal only by recourse to the most efficient techniques. That means that the job must be broken down into the different functional operations, and highly efficient specialized structures and methods be used to accomplish each. There are three separate functional organs that we must consider. They are: 1) the code storage and scanning engine, 2) the abstract record and reading engine, and 3) the abstract printing stations. These organs, unlike the corresponding elements in the Rapid Selector, are physically separate structures. We will consider them in turn.

"The Code storage and scanning engine contains the coded subjects of 100 million documents. Therefore, at least from considerations of sheer bulk, the most efficient possible subject coding must be used. The choice here is Zatocoding--the method of superimposition of random codes in each subject field--since this method seems to be considerably more efficient than any other coding scheme now known. We let each document be described by as many as 25 different cross-referenced subjects. The coded record is micro-photographed on photographic film, and this film strip is helically wound on a metal drum 10 feet in diameter and 7 feet long. This drum is driven at about 300 rpm, and the scanning head, following the helically-wound film, passes from one end to the other in less than a minute. The codes for more than one million documents are scanned in each second. This is about 5,000 times Rapid Selector speed. The basic principles of such a scanning head, able to do this with standard equipment, have been worked out. Selections, when made, are temporaily recorded as document or abstract numbers in an electronic or magnetic memory. The selections are made according to any simple or complex configuration of subject ideas, which can be chosen arbiitrarily to suit the needs of the request at hand.

"The abstract storage and reading engine is the organ which stores micrographic copies of 200-word abtracts and the citations for the documents. A single, large, square, semi-transparent sheet carries from a quarter million to a million of such abstracts. These sheets are stored in a stack, and by a mechanism like that of an automatic juke box record changer, the different sheets are pulled out of the stack to be read by an optical copying television head. This read head, using the two coordinate positions of the wanted abtract, finds the abstract, magnifies it, and electrically copies it into a wire circuit. Many such optical heads are working at the same time in the abstract storage engine. This abstract storage and reading engine fits nicely in a ordinary large-sized room, since the stack is only about 20 feet long.

"The abstract printing stations are placed remote from the rest of the engine--at the request desk or in the mailing room for mail service. The process used is a fast dry-printing, employing either ultra-violet sensitive diazo paper, or an electro-sensitive facsimile paper. Photography (silver) and Xerography do not meet nearly as well the requirements for a fast, simple and cheap process for giving a single-copy. Presently available equipment, about the size of table radio and now on the market, can produce about ten 200-word abstracts per minute at each station. There are as many stations in the operation as there are reading heads in the reading engine. The abstracts produced are reasonably clear, and are full sized and readable without any optical aid.

"Such is the DOKEN. It can built if there is a need for it. Part of the world's intellectual output is already being abstracted. With cooperation, and less than 10% additional effort, this same information could be put into a DOKEN system. Perhaps this cooperative endeavour will take the pattern so well worked out by Chemical Abstracts with its large corps of volunteer abstractors, and smaller staff of central editors. If so, the cost of the world-wide documentary project could be whittled down to manageable proportions. Support could be on a subscription basis. Bibliographic searches to any request would be furnished by return airmail, giving an overnight service to information users.

"Smaller verions of the same instrument have a possible use in other situations, such as the whole chemical literature, the U.S. Patent Office, or the files of insurance companies. In such smaller collections, a much more complete subject coding is possible and would certainly be desirable int the case of patents.

"With regional DOKENs available, company collections of information on punched cards can be enriched by the inclusion of specially selected items from DOKEN bibliographies. But these bibliographies of abstracts would generally have to be pruned, recoded, and 'slanted' into the particular company's technical viewpoint in order to raise their utility up to the company's retreival system threshhold value." (pp. 10-12)

Timeline Themes

Indexing & Searching Information

Memory / Mnemonics / Data Storage

Related Entries

coined the term "information retrieval" in 1950:

Probably the Earliest Extensive Collection of Paintings

The Ishango Bone, Possibly One of the Oldest Calendars

Standardization of the Homeric Texts Possibly Begins

The Form of the Manuscript Book Gradually Shifts from the Roll to the Codex

Paul Julius Reuter Founds the Reuters News Service

Doron Swade Characterizes Flong as an "Immutable Form of Information Capture"

Alfred Smee Speculates About a Logic Machine that Might Occupy a Space Larger than London

Marconi Sends the First Transatlantic Radio Transmission?

Jule Charney, Agnar Fjörtoff & John von Neumann Report the First Weather Forecast by Electronic Computer

"High-Speed Computing Devices," the First Textbook on How to Build an Electronic Computer

Microfilm Rapid Selector:

In "As We May Think" Vannevar Bush Envisions Mechanized Information Retrieval and the Concept of Hypertext

The National Bureau of Standards Reviews Analog "Information Selection Systems: Retrieving Replica Copies"