Linguistic Themes and Political Trends

Ultimately, I would like to work in combining my interests in politics and data science to analyze political trends. While I have no faculty mentor for that project currently, I’ve been piecing together the tools necessary for my first foray into that subject. I have been attempting to answer the question as to whether linguistic themes in the speeches of political leaders can predict the peaks and valleys of intentional tensions before they occur.

Obviously, part of this relationship is causal. When North Korea threatens to bomb the United States, tensions between the two nations rise. What I’m interested in are the relationships that are less clear to the casual observer. If the president of the United States starts using more language that we qualitatively tag as ‘patriotic’, will that predict a rise in tensions with other nations? a fall? if there is some relationship, then what nations are affected?

In order to begin answering this question, I am looking at speeches from leaders of the U.S. and the Soviet Union during the cold war. This will make it easier to gauge if what I choose as predictors are succeeding as there are clear points of rises and falls in tension that can be referenced. Ultimately, I hope to implement semi-supervised (to restrict the amount of time required for something to be considered a predictor, and to eliminate clear causal relationships) cluster analysis and natural language processing to answer this question for the Cold War period, and then see if those linguistic trends can be extrapolated to the modern day. If not, what linguistic changes occur?

I have done some preliminary qualitative tagging of 60 speeches, and the trends that I noticed looked promising. This is too little information to say anything concrete just yet, but it’s enough for me to want to develop the skillset to properly tackle this problem.