Click Here to download a “clean” .txt document of Christopher Marlowe and Thomas Nashe and William Shakespeare’s Henry VI, Part III (Regular spelling).
The file has undergone the following data cleaning protocols in order to make it suitable for text analysis:
- Line numbers, IMG and SIG information using RegEx: [a-z]+\s\d\d\d\d
- Page breaks and indents removed manually
- Speaker tags removed manually
- Spaces entered between speakers
- Beginning publishing information and ending footnotes removed
- Spaces added between words as needed
Data Cleaning Credit: Meggan Law (Framingham State University ‘24)
Henry VI, Part III.txt