>

METHODOLOGY

Dataset

The dataset used in this analysis was sourced directly from the anonymous moderator of the group on January 14, 2023 and, according to them, contains the full text of all posted submissions from October 22, 2019 to January 12, 2023. During this period, there have been several moderator and group name changes to the Facebook group, and the ‘Prince’ did not independently verify that all of the posts on the Facebook group are included in the dataset.

Data Cleaning & Tokenization

We removed all punctuation and capitalization from submissions, meaning that words like “princeton,” “Princeton,” and “pRinceton” are all recognized as the same term. Unless otherwise noted, to track the frequency of particular terms, we used a whitespace tokenizer (“I love data” would be tokenized to “I”, “love”, “data”).

Frequently Used Metrics


Post Mentions

All of our word frequency charts plot both the number of posts that contain a particular word over time and the fraction of posts that contain that word in a given month over time.

Word Co-Occurrence

One statistic we looked at was word co-occurrence — the number of posts containing one word that also contain another word. So, for example, if 100 posts contained the word “Princeton”, and 10 of those posts contained the word “University” (i.e. contain both “Princeton” and “University”), the word co-occurrence would be 10/100, or 10 percent.

Word Association and Word Clouds

When finding the most common words used in a post containing a given word and creating word clouds, we removed stopwords (i.e. commonly used words such as “the” and “and”). We used the Python word cloud generator to generate the word clouds — it also counts common bigrams, so words that appear near each other in the word cloud also appear close together in posts. For instance, in the word cloud for “divest”, “divest” appears more than once by itself and with “divest princeton.” The “most common words” highlighted in the article include, unless stated otherwise, the 30 most common words.


***