Transcribing Early Modern Texts for Encoding

Part 1: Rules for transcribing early modern texts for publication on the KMP site and TAPAS platform

Part One of this transcription guide closely follows the “How-to #0: Rules for transcription” guide authored by Janelle Jenstad and Martin Holmes at The Map of Early Modern London. Every transcription project has its own set of editorial principles, guidelines and rules. The MoEML approach is very straightforward and we shall follow their lead.The initial transcription is a diplomatic transcription; in other words, we try to reproduce every feature of the original text as precisely as possible.

Spacing

MoEML closes up extra spaces between words and punctuation marks. However, we retain the spacing in authorial initials, such as A. M. (for Anthony Munday). We add a single space after a coma when the comma has been used to separate two words.

Unusual characters

Unusual characters would be reproduced. A range of archaic and unusual glyphs (hieroglyphic character or pictograph) may appear in your texts. All should be available in Unicode, so you should be able to reproduce them.

For example:

Long s = ſ Unicode: 017f

Nasal tildes = ~ Unicode 007E (Note: you’ll likely need to find individual letters with tildes over them)

Vowel ligatures

œ = Unicode 0153

æ = Unicode 00E6

E with macron: ē Unicode 0113 (Note: expect o’s with macrons also)

Thorns yͤ

Type the letter y
Type 0364 right after the y, no spaces: y0364
Alt-x

This list is hardly exhaustive. You will likely have to search the internet for many special characters that you find in these texts. That said, please expand ct, ff, fi, ffi, ffl, ft, and st ligatures.

Line breaks

During transcription, you’ll reproduce line breaks by hitting “return.” Whether we are encoding prose or verse, we reproduce line breaks in the text.

Hyphenation

Preserve hyphenation, both within and at the end of lines. While transcribing, put end of line hyphens in [square brackets] so that you may encode them into the line breaks later.

Interchangeable characters (vujis!)

We retain the interchangeable u/v and i/j and the use of vv for w.

Typographical decorations

Typographical decorations including bold, italic, and underline, should not be reproduced in your transcription. We encode these features with CSS code later on.

Part II: Transcription tips!

As we move forward in this round of encoding, we’ll use this post to compile prof. and class-generated transcription tips.

The most frequently asked question in this part of the course is: What do I do with all of the stuff I’m not formatting right now, like italics, font shifts, headers, indents, chapter headers, catchwords, and signatures (to name a few)?

Put special features in square brackets for now. When we shift your transcriptions from the Word OneDrive files that you’re collaborating with your teams on, our oXygen software will not “read” the square brackets as code. They are also easy to search and replace with the appropriate tags.
Rowan (Fall ’17 Rogue; Spring ’18 TA) suggests you also highlight special textual features so that they are easier to see and proof as you go along.
Another trick for speedier encoding of special characters that Zoe (Spring ’18 Rogue) has discovered is to highlight unusual to us, but common special characters in the text for quick replacement. She and her team highlighted small “s” characters that needed to be changed to a long ſ and changed them with judicious use of the find/replace feature.

Part III: Bibliographic features of the text

Use the following diagram as a reference to remind yourself about components of the text that will require special formatting moving forward. My hope also is that this diagram will serve as a reminder to include running headers, catchwords, and signatures – whether they have been omitted or not. Please feel free to ask as many questions as you have as we move forward!

Please click on the image below to access a full-page pdf of the diagrammed text.

Kristen Abbott Bennett, 22 March 2018

Return to Teaching Resources

Return to About

Transcribing Early Modern Texts for Encoding