Spring 2024 Internship – Summary of Project Work
Project Background:
This project showcases the applications of Python for humanities-based text analysis. Using corpora created from William Shakespeare’s and Christopher Marlowe’s dramatic works, I wrote code to create five platforms for text analysis, including: word clouds, Word Frequency graphs, Lexical Dispersion Plots, Topic Modeling, and Word Adjacency Networks. While computer programming still can be intimidating for beginners, I created accessible “how to” guides for downloading software (Anaconda) on both Mac and PC devices, and for performing analysis. See the Text Analysis page in the Research Resources section to explore these tools on your own. The public repository used for this internship can be found under https://github.com/alex-krtt/kitmarlowe-jupyter.
Jupyter Notebooks Created:
- Word Frequency: Visualizes the frequency of the most utilized adjectives, verbs, and nouns within the corpora.
- Lexical Dispersion: Analyzes and displays the distribution of selected words over the course of a text or corpus, highlighting their occurrence patterns.
- Topic Modelling: Employs algorithms to identify, categorize, and visualize the main themes or topics present within a large collection of texts.
- Word Adjacency Network Visualization using Seaborn: The Word Adjacency Network (WAN) visually represents the probability that two texts share the same author by analyzing the proximity of high-frequency words. This project presents a visualization of the WAN results, building upon the original code by Gabriel Egan.
- Wordcloud Generator: Creates visually engaging word clouds that emphasize the most prominent words found in the text, offering a quick insight into the text’s key themes.
Blog Posts Created:
- Anaconda Installation Guide for macOS: Step-by-step instructions for setting up Anaconda on a macOS system.
- Anaconda Installation Guide for Windows 10/11: A comprehensive tutorial for installing Anaconda on Windows 10 or 11.
- Running Your First Jupyter Notebook: An introductory walkthrough for running your first Jupyter notebook using Anaconda and JupyterLab.
- Running Your First JupyterLab Notebook on the Cloud with GitHub Codespaces: A guide to launching and running the Jupyter notebooks on the cloud via GitHub Codespaces.
- GitHub Account Creation Guide: A straightforward guide to setting up a new GitHub account.
Internship Technical Details:
All programming and analysis for this project was performed using the Python programming language (version 3.11). Text editing and notebook development was primarily done using Microsoft’s Visual Studio Code, with occasional use of the JupyterLab web interface for quick data exploration and visualization. Source code version control and backup was implemented using GitHub, with a public repository created specifically for this project (https://github.com/alex-krtt/kitmarlowe-jupyter).
The corpora of William Shakespeare and Christopher Marlowe were integrated into the repository to streamline access and analysis. The programming environments were configured by installing the Anaconda Python distribution on both macOS (Ventura) and Windows 11 test systems. Anaconda Navigator was used to launch JupyterLab and manage Python environments and packages. Core data science packages that were used include NumPy, SciPy, matplotlib, spaCy, gensim, pyLDAvis, nltk and scikit-learn. Exporting my code for was done in LaTeX using MacTeXdistribution on macOS. The macOS and Windows test systems were configured without administrative permissions to simulate how users in academic environments with limited privileges would have to follow the provided installation guides to set up the programming environments and reproduce the text analysis workflow.