Detail map of London, England, United Kingdom,Riverside, California, United States

A: London, England, United Kingdom, B: Riverside, California, United States

William St. Clair on Improving the Research Potential of ESTC


Mr Brian Geiger

Center for Bibliographical Studies & Research
University of California, Riverside
Highlander Hall, Building B, Room 114
Riverside, California 92507

17 April 2012

Dear Mr Geiger

Improving the research potential of ESTC: consultation: A modest suggestion by William St Clair

I welcome the opportunity extended through SHARP L to offer suggestions for making ESTC more useful for researchers in the 21st century. Much of future research will, we can be confident, take the form of quantitative analysis, not necessarily just for checking of historical records, but for identifying trends, trying to recover the reading of the past, generating explanatory models and so on. The current suggestions for improving the interrogability of the present resource are to be welcomed.

However, I suggest, if we want to make the ESTC a research tool for the more ambitious questions, as is the stated aim, then in my view, the proposed changes will make only a limited contribution. Although in the English-speaking world, we have excellent catalogues and bibliographies, to my mind, the empirical factual basis on which those who attempt to address ambitious questions are reliant is seriously inadequate. Indeed, I suggest, the extent of the present inadequacy of data would not be tolerated by those familiar with the standards applicable and expected in the sciences and social sciences.

The biggest weakness for anyone attempting a history of books, or of the book industry, or of reading, is that 'titles' is a poor measure of book production. What we need, for a start, if we want to map the material extent of past production, are figures for print runs and sales, and also for price, as a good indicator of potential access, plus explanatory economic models for the various governing regimes [guild system, perpetual copyright, pirate and offshore, and so on]. We also need to develop formal ways of recognising and offsetting the inadequacies of the patchy archival record. Draft proposals for an ambitious project that would enable this kind of step change improvement in the data to be made have been prepared. If they are proceeded with, it will however take time before results become available and we see the benefits.

However, there is a notable weakness in the current situation that can be easily addressed and remedied, and that would be a helpful step in the right direction of making ESTC a more useful research tool. ESTC should, I suggest, consider the suggestion that I have made in print and in lectures on a number of occasions, most fully in my chapter in The Cambridge History of the Book in Britain, volume 6.

ESTC is, in a way, a victim of its own success. For the ready availability of lists of titles has fostered an illusion of completeness and that has led to users attempting to use it for purposes for which it is inadequate. Indeed the consultation document that has been circulated helps to prolong the illusion. I quote: 'The English Short Title Catalogue (ESTC) is a union catalog and bibliography of everything printed between 1473 and 1800 in England and its former colonies or in the English language elsewhere.' In fact, as I understand the situation, ESTC is a union catalogue of titles of which at least one copy is known to have survived somewhere in the world?

That is not a debating point. It has long been known that the survival rate of books and print from the early centuries is badly incomplete. And this is not just a general common sense understanding. We have good evidence of the large scale of the losses. D. F. McKenzie’s observation that the size of the English printing industry, as measured by the physical capital (presses) and personnel employed (apprentices and printers) scarcely changed between the mid sixteenth and late seventeenth century can only be squared with the sharp rise in surviving titles over the same period by postulating either that a high proportion of the industry was maintained in unemployment or that more output occurred than has survived, or some combination. [In CHBB, iv,17]. Since, until the early eighteenth century, the English state attempted to control the texts that were permitted to circulate within its jurisdiction not only by an array of direct textual controls but also by limiting the capacity of the industry, measured not by titles but by numbers of printed sheets, it is highly unlikely that a huge proportion of industrial capacity was kept in idleness or in reserve.

And we know the titles of many of these lost printed books. It has been known, at least since 1875, that the Stationers’ Company register included only a proportion of titles of which copies survive. [E Arber, A Transcript of the Registers of the Company of Stationers of London 1554-1640 volume 1]. The finding in my book The Reading Nation in the Romantic Period, 2006, 74-75, and 495-496, not challenged as far as I know, that large numbers of abridged ‘ballad versions’ of biblical stories were officially permitted until a sudden stop around 1600 depended upon my taking account of these lost, but registered, pieces of printed literature by scrutinising the registers myself. This result, and there are others, could not have emerged by interrogating ESTC nor would the current proposals to improve interrogability help. It is simply inadequate.

And it is not only in the early centuries of print that lists of titles known to survive are inadequate. For the eighteenth century, the survival rate of books known to have been produced, for example by being listed in advertisements, looks good for expensive books, patchy for some genres such as novels, and extremely poor for cheaper print. [Especially the two Dicey catalogues. Discussed in Reading Nation 340, and there appear to have been more cheap reprints of titles that entered the newly created public domain in the period after 1774 than are listed]. Soon after the 1800, the cut off date of ESTC, when we move to the age of stereotyping, 'titles' is such a poor indicator of production as to be of little value, and the survival rate for cheaper print known to have existed is even more poor.  ESTC cannot be held responsible if others misuse the information. But already it is used is to produce bogus statistics, even for titles, even for trends. Indeed, in some books and articles, the software that enables graphs, pie charts, bar charts and so on to be easily produced has given a spurious pseudo-scientific plausibility to results whose factual basis is simply not able to support them.  "There are no conceptual or methodological problems in my modest proposal for including lost books in ESTC. Provided the results can be aggregated with ESTC, they could be kept separate. The resources needed are largely clerical, potentially realisable by crowd sourcing. The potential improvement in the quality of research would, I suggest, be highly cost effective.

Yours sincerely
William St Clair

(source, accessed 04-23-2012).

Timeline Themes