Click Here to download a “clean” .txt document of Christopher Marlowe and Thomas Nashe and William Shakespeare’s Henry VI, Part II (Regular Spelling). 

The file has undergone the following data cleaning protocols in order to make it suitable for text analysis:

  • Line numbers, IMG and SIG information using RegEx: [a-z]+\s\d\d\d\d
  • Page breaks and indents removed manually
  • Speaker tags removed manually
  • Spaces entered between speakers
  • Beginning publishing information and ending footnotes removed
  • Spaces added between words as needed

Data Cleaning Credit: Meggan Law (Framingham State University ‘24)

Henry VI, Part II.txt
