The file has undergone the following data cleaning protocols in order to make it suitable for text analysis:

  • Line numbers, IMG and SIG information using RegEx: [a-z]+\s\d\d\d\d
  • Page breaks and indents removed manually
  • Speaker tags removed manually
  • Spaces entered between speakers
  • Beginning publishing information and ending footnotes removed
  • Spaces added between words as needed

Data Cleaning Credit: Meggan Law (Framingham State University ’24)

