Transfer Topic Labeling with Domain-Specific Knowledge Base: An Analysis of UK House of Commons Speeches 1935-2014


Topic models are widely used in natural language processing, allowing researchers to estimate the underlying themes in a collection of documents. Most topic models use unsupervised methods and hence require the additional step of attaching meaningful labels to estimated topics. This process of manual labeling is not scalable and suffers from human bias. We present a transfer learning approach to topic labeling that leverages existing knowledge-base in political science to automatically label topics. These labels can be used instead of human labeling or supplementing it by guiding the labeling process in a more replicable procedure by retaining humans in the loop. We demonstrate our approach with a large scale topic model analysis of the complete corpus of UK House of Commons speeches 1935-2014, using the coding instructions of the Comparative Agendas Project to label topics. We evaluate our results using human expert coding. We show that our approach works well for a majority of the topics we estimate; but we also find that institution-specific topics, in particular on subnational governance, require manual input.

The slides for the talk are available here