Back and Front Matter in the TCP corpora

‘Paratext’ may refer to various parts in, around and about the text. In his famous formula — “Paratext=Peritext+Epitext” — Gerard Genette famously included in his category ‘epitextual’ materials, namely those that are outside the boundaries of a book (or other container of the text, such as manuscript or webpage). Thus, for example, ‘paratext’ may also refer to correspondence or interviews which relate to the text.  Peritexts, on the other hand, are closer chaperons of the text: running heades hover above the page, notes and other marginalia in its side or foot, scholastic marks may infiltrate between the lines, corrections bluntly interfere with them. In fact, ‘paratext’ is often used to denote even aspects of the text that are only very abstractly separable from it, such as design and typography.

 In what comes next, however, I treat the rather paradigmatic paratext that is to be found in the separable front or back of the Early Modern book, as it is found in the EEBO and ECCO corpora and can be explored in the Text Creation Partnership texts. Conveniently, the TEI guidelines have “Front”, “body” and “Back” as the basic sections in the default structure of the text, and this enables us to mine this paratextual phenomena and look at the corpus as if we are looking at the structure of the one grand book of the 16th-17th century, and can then compare it to the grand book of the 18th. Here is how they look:

Screen Shot 2015-01-07 at 4.27.17 PM

Only about two thirds of EEBO-tcp books have front matter, whereas the ECCO-tcp corpus differs significantly. Front matter in the 19th century is pervasive. The trend is clearer when visualized chronologically:

Screen Shot 2015-01-07 at 4.27.28 PM

If we trust our data, it seems that the structure of the book and especially the convention of front matter stabilized only in the 18th century. What made front matter so obligatory around 1600, and then again from 1700 onwards?

Theories about front matter the transition from a system of patronage to the market of book consumers, or on the other hand, front matter and the construction of the modern author, in its defiance against authority are relevant, but they are much better discussed on the background of more refined analyses, treating separately the various genres of front or back matter. We will get to this later; at this point there is still a more general issue to discuss, which rises when we chose to ask the opposite question: what may have caused so many books in the 1550-60, almost half the books in 1680 to have no front matter?

And this is my concern: corpora can be mischevious. They mesmerise us with the allures of big data, but may sometimes trick us, miners, to believing that the data we get is History-given, when they are in fact what Ben Schmidt describes as “data artifacts” of the quircks of materiality, institutional history or cataloguing choices. These quircks often lurk in other historical layers than those which we study.  This is when trying to conclude from peaks and lows in the data mined, we need to develop hermeneutics of corpus suspicion. Starting with the realization that our data is really capta, we then need to divert some attention from our historical subject matter to other places and periods, and get to know the captors of our data/capta- those who assembelled it, stored it, catalogued, classified and tagged it, before we laid our tools on it.

Getting back to the charts above, a question comes to mind, that should be directed to book historians: is it possible that at least partly, the low point of frontal matters that we see in 1550-1560 and then in 1680, are traces of preservation history? that at times, front matter or back matter were not deemed worthy of preserving and therefore never reached the digital corpus?  We often see this neglect of paratext, especially allographic (namely, written by someone other than the author) when text is transfered to the digital medium.  Could it be that someone, somewhere, at some time in the last two or three centuries, perhaps when rebinding, selling, or storing – wanted to save space, and simply chucked the pages from the fronts of books?

“There’s nothing wrong with being wrong”, writes Ben Schmidt. “To tap into all that knowledge out there, we need to be wrong in public, quite frequently”. I’ll take his advice and hope  someone would comes forward and settle this doubt.

There is, in fact, something else that is misleading in the charts above. This will be the subject of my next post.

And now too the recipe:

Again, the full scripts and the results can be found on my GitHub, this time in the Back and Front matter folder. Here I put only the gist of the python script:

front_general = input_root.findall(‘.EEBO/TEXT/FRONT’)
back_element = input_root.findall(‘.EEBO/TEXT/BACK’)

if (front_general !=[]) and (back_element !=[]):
      coverstate= “both”
elif front_general !=[]:
     coverstate= “frontonly”
elif back_element !=[]:
     coverstate= “backonly”
     coverstate= “none”
text_file.write(‘%-30s\t%s\t%d\t%d\n’ %(doc_id ,coverstate, year, decade))

About Sinai

Post-doctoral fellow at the Polonsky academy, Jerusalem. Interested in text mining the language of dedications, prefaces, letters to the reader and other mainly - but not only - Early Modern kinds of paratext, and more generally, in what the digital humanities may hold for the study of paratext.
This entry was posted in Uncategorized. Bookmark the permalink.