Understanding State Preferences With Text As Data: Introducing the UN General Debate Corpus

Published in Research & Politics, 2017

Recommended citation: Alexander Baturo, Niheer Dasandi, and Slava Jankin Mikhaylov (2017). "Understanding State Preferences With Text As Data: Introducing the UN General Debate Corpus." Research & Politics, 4(2).

Every year at the United Nations (UN), member states deliver statements during the General Debate (GD) discussing major issues in world politics. These speeches provide invaluable information on governments’ perspectives and preferences on a wide range of issues, but have largely been overlooked in the study of international politics. This paper introduces a new dataset consisting of over 7300 country statements from 1970–2014. We demonstrate how the UN GD corpus (UNGDC) can be used as a resource from which country positions on different policy dimensions can be derived using text analytic methods. The article provides applications of these estimates, demonstrating the contribution the UNGDC can make to the study of international politics.

Download paper here

Replication materials here

GitHub repo with UN General Debate corpus here