Complexity Explorer Santa Few Institute

Foundations & Applications of Humanities Analytics (Spring 2023)

Lead instructor:

This course is no longer in session.

8.1 Case Study: Capitalism & Democracy » Chapter 5 part I Overview

What you will learn from this chapter

Though an example, you will learn how to use newspaper archives to study the way in which people throughout history have linked together two key concepts: capitalism and democracy. In general, you will recognize the extent to which it is possible to operationalize concepts that have deep and complex meanings, and how that simple operationalization facilitates the application of straightforward computational and mathematical methods (here, word searches and counting). In particular, you will see how word frequencies can be used to summarize changes over time in how readers of different newspapers were paying attention to democracy and capitalism, and how those changes intersect with major historic events.


Key terms to keep in mind


Visualization   A presentation of a pattern within data in pictoral form (e.g., chart, graph), that does not require a narrative explanation or extensive text for the viewer to interpret.


Intellectual history   The area of history concerned primarily with how and why peoples' ideas –and the sorts of ideas that are popular in a given moment or place – change over time. One can contrast intellectual history with political history, which is the study of how and why peoples' political arrangements change over time.


Cultural history   The area of history concerned primarily with changes in how people produce and relate to cultural output – art, media, food, fashion, literature, etc. – over time. 


Word frequency   The proportion in which a particular word is used. Humanities analytics approaches tend to use word frequency as a signal of attention: higher word frequency suggests people are interested in the concepts the word represents. One can obtain a word frequency simply by counting: count how many times does the word 'capitalism' occur in a single newspaper article, divide the number of 'capitalism's by the total number of words in the article, and thereby arrive at a word frequency for 'capitalism' in a single article. This value may be presented as a decimal, or fraction ('capitalism' = 0.001 of words in the article) or a percentage ('capitalism' = 0.1% of words in the article).

An extension of this measure used in the lecture is to obtain the frequency of newspaper articles that include a given word: count the number of articles that contain 'capitalism', divide the number of 'capitalism' articles by the total number of articles, and thereby arrive at a frequency for 'capitalism' across a corpus.


Normalizing articles   Articles that are part of a corpus and provide a standard for measurement(s); articles that comprise the deonominator when calculating a word, or article, frequency. In many cases, the "normalizing articles" would be equivalent to the "total articles" in a corpus. In the example of the New York Times you will see in the lecture, however, "total articles" is not a number that can be determined explicitly and thus a proxy is needed.