Click here to download a “clean” .txt document of Christopher Marlowe’s, Thomas Nashe’s, and William Shakespeare’s I Henry VI (Regular Spelling).
The file has undergone the following data cleaning protocols in order to make it suitable for text analysis:
- Line numbers, IMG and SIG information using RegEx: [a-z]+\s\d\d\d\d
- Page breaks and indents removed manually
- Speaker tags removed manually
- Spaces entered between speakers
- Beginning publishing information and ending footnotes removed
- Spaces added between words as needed
Data Cleaning Credit: Meggan Law (Framingham State University ’24)
Henry VI, Part I.txt