Natural Language Processing for Policy and Government

About me

I am a Chair in Data Science and Government at the University of Birmingham. This is a joint professorial appointment between the School of Government and School of Computer Science. My research focuses on Natural Language Processing (NLP) for policy and government. At the University of Birmingham, I am establishing a new Centre for AI and Government (CAIG) as part of a sustained and increasing programme of investment in data science and AI research. Roughly half of my time is seconded to the Institute for Interdisciplinary Data Science and Artificial Intelligence (IIDSAI), where CAIG is also based.

I was previously a Professor of Data Science and Public Policy at the Hertie School of Governance in Berlin. At the Hertie School I set up the Data Science Lab – a policy school centre of competence focusing on data science teaching and research. Before joining the Hertie School, I was a Professor of Public Policy and Data Science at University of Essex, holding a joint appointment in the Institute for Analytics and Data Science and Department of Government. At Essex, I served as the Chief Scientific Adviser to Essex County Council, focusing on artificial intelligence and data science in public services. I also worked at University College London and London School of Economics.

I received a PhD in Political Science from Trinity College Dublin. In the subfield of “Robust NLP” I was working on the problem of noisy labels in political text data. In the context of NLP and machine learning, noisy labels refer to inaccuracies or inconsistencies in the dataset's ground truth labels, which are used for training and evaluating models. Addressing the problem of noisy labels is crucial for building reliable and accurate NLP models, especially when dealing with large-scale datasets or user-generated content, where label noise is often unavoidable.

Selected invited talks

Lauching the Peace and Security Data Hub

October, 2021
UN World Data Forum Read more

Workshop on Computational Linguistics for Political Text Analysis

September, 2021
CPSS @ KONVENS 2021 Event page

Data Science for Data Driven Public Services

September 22, 2021
GIZ Future Forum - Data For Development Event page

Tracking the Connections Between Public Health and Climate Change

January 27, 2020
Applied Machine Learning Days at Swiss Federal Institute of Technology Lausanne, Lausanne, Switzerland Event page and recording

Complexity and Data Science: Cluster of Methods - pattern analysis, machine learning, causal inference

November 25, 2019
Helmholtz Incubator Information and Data Science Workshop, Berlin, Germany Slide deck

AI for SDG 16 on Peace, Justice, and Strong Institutions: Tracking Progress and Assessing Impact

August 11, 2019
Workshop on Artificial Intelligence and United Nations Sustainable Development Goals, IJCAI International Joint Conferences on Artificial Intelligence, Macao, China Slide deck

AI for Common Good

June 14, 2019
AI TRAPS: Automating Discrimination, Berlin, Germany Slide deck

NLP Applications in Political Science

December 12, 2018
Language and Computation Seminar Series, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK Slide deck

Essex Centre for Data Analytics - a new vision for Essex

December 05, 2018
Innovation Series - Knowledge Gateway, Colchester, UK Slide deck

Data Science and AI for Public Good: Lessons from cross-sectoral collaboration

November 27, 2018
Bringing Data To Life For Policy and Practice: The BLGDRC Conference 2018, London, UK Slide deck

Transfer Topic Labeling with Domain-Specific Knowledge Base: An Analysis of UK House of Commons Speeches 1935-2014

November 08, 2018
Center for Comparative & International Studies, University of Zurich, Zurich, Switzerland Slide deck

Text Analysis and International Organizations - Tutorial

January 22, 2018
Empirical Research on International Organizations, Lorentz Workshop, Leiden University, Leiden, Netherlands Slide deck

Data science for the public sector

October 31, 2017
The growing ubiquity of algorithms in society: implications, impacts and innovations. The Royal Society Scientific Meeting, London, UK Slide deck

Slava Jankin


Leadership roles and selected grants awarded

The Centre for AI in Government

The Centre for Artificial Intelligence in Government (CAIG) is a research centre at the University of Birmingham, situated within the Institute for Interdisciplinary Data Science and Artificial Intelligence (IIDSAI). It’s part of a £5.5mln investment by the University to expand AI and data science in the social sciences. Our primary objective is to connect social sciences and artificial intelligence, fostering collaboration among researchers and supporting the College of Social Sciences (CoSS) in AI-related endeavours. CAIG's mission encompasses research, support, and growth, focusing on internationally recognised research programmes and the cultivation of AI-savvy decision-makers.

Our research interests centre on AI for Democracy and AI for Health and Climate Change. In the AI for Democracy, we investigate the impact of generative artificial intelligence on democratic processes. The AI for Health and Climate Change initiative aims to develop cutting-edge surveillance tools to monitor health implications of climate hazards and predict health consequences of climate change.

To support social science faculty, CAIG employs a senior research data scientist and two postdocs to help with the existing and develop new data science and AI research. We also contribute to undergraduate, postgraduate, and executive training programmes, and provide students with opportunities to participate in faculty-led research projects.

Data Lab

Data Lab

Hertie School Data Science Lab

Hertie School is the most important policy school in the EU. I was the founding director of the Hertie School Data Science Lab – a centre of competence in AI and data science. Our mission was to foster, advance and promote a new generation of policy makers comfortable with technical and governance aspects of AI and data science – part of the new Master of Data Science for Public Policy. The research programme of the Lab focused on the applications of AI and data science methods such as computer vision, natural language processing, experimental survey methods, and causal inference to substantive problems in areas including political behaviour, climate change, decision making, and public policy. Research produced by the Lab has appeared in top scientific journals including The Lancet, PNAS, and Nature, and leading machine learning conferences such as NeurIPS, ICML, and IEEE Big Data.

Learn More

Essex Centre for Data Analytics (ECDA)

At Essex my work focused on embedding artificial intelligence and data science in public service delivery. As Chief Scientific Adviser to Essex County Council I was the University lead on the Essex Innovates programme. The aims are to make Essex a place that is an exemplar for the integration of data across public bodies; to have the skill, capability and technology to undertake predictive analytics based on high ethical standards; to have a sustainable data infrastructure; and to have the best data science capabilities in the UK to benefit our people and communities. An outcome of the Essex Innovates is the creation of an office for data analytics - ECDA - an institutionalised, long-term collaborative effort to tackle public policy issues in Essex. ECDA will deliver on its aims through a data sharing platform, research and development platform, and an analytical hub pooling the capability across the partnership.

Learn More


My research primarily focuses on natural language processing (NLP) for policy and government applications. I explore the unique requirements and constraints of NLP in this domain, such as stakeholder involvement, data restrictions, decision-making focus, and political risks. I address two key questions in my work:

  • How to make NLP models robust to domain-specific constraints?
  • How to ensure their robustness upon deployment in government settings?

I investigate the robustness of NLP models by developing grounded language models using attributional and causal structure elicitation. My work in this area concentrates on health implications of climate change in citizen, corporate, and government communications. My research has been published in journals such as The Lancet, The Lancet Planetary Health, The Lancet Public Health, and The Bulletin of the WHO.

My ongoing work in this area focuses on identifying attributional links between self-reported health effects and climate change. It is funded through a €1m Horizon Europe grant (CATALYSE) to develop a public health digital surveillance system. This is part of The Lancet Countdown: Tracking Progress on Health and Climate Change scientific collaboration.

I have also contributed to the estimation of uncertainty bounds for NLP tasks using resampling and modelling approaches. These contributions have been published in prominent political science journals, including American Journal of Political Science, Political Analysis, British Journal of Political Science, Legislative Studies Quarterly, and American Political Science Review.

Additionally, I work on answering policy questions using small, heterogeneous data through transfer learning. This research has been featured in publications such as Research & Politics, IEEE Big Data, and IJCAI.

Drawing from public management and change management perspectives, I examine the critical factors for sustainable embedding of NLP systems in government decision-making processes. This work is informed by my practical experience at the University of Essex and the Hertie School of Governance in Berlin.

At the University of Essex, I served as a Chief Scientific Adviser to Essex County Council, leading NLP initiatives in education and social care delivery. At the Hertie School, I founded a data science lab that contributed to a €240 million German government programme to establish data science centres in all federal ministries.

I have collaborated with international organisations such as the European Commission, United Nations, UNDP, and World Bank on projects ranging from illicit trade detection to post-COVID recovery policies. Some of this work has been published in Philosophical Transactions of the Royal Society A, Computer, Public Policy and Administration, Energy Research & Social Science, Public Administration, and Social Science Computer Review.

Projects funded


Climate Action To Advance Healthy Societies in Europe

Role: Principal Investigator - working group lead on development of innovative health surveillance and forecasting tools that facilitate effective policy response to environmental health hazards caused by climate change.

  • Project goal: Despite clear signs that the impacts of climate change are escalating, the global response has been inadequate. Traditional scientific efforts have fallen short of providing knowledge and tools that have been broadly applied in decision-making, and innovative approaches to knowledge translation are needed. To catalyse climate action in Europe to protect public health, our overarching goal is to provide new knowledge, data, and tools on: i) the relationships between changes in environmental hazards caused by climate change, ecosystems, and human health; ii) the health co-benefits of climate action; iii) the role of health evidence in decision making; and iv) the societal implications of climate change for health systems.
  • Funder: Horizon Europe | European Commission
  • Total funding: €10.354 millions | Hertie allocation €975,000
  • Funding period: 01.07.22 → 30.06.27


Contestations of the Liberal Script | Centre of Excellence - “Leader types and Liberal Narratives of the COVID-19 Pandemic”

Role: Principal Investigator

  • Project goal: This project compares decision-makers in the pandemic, with a focus on leaders, health and finance ministers. The role of these ministers is taken into account because much debate has revolved around the issues of life versus livelihoods. It considers the degree to which these persons are “experts” in the relevant policy area. It further investigates the extent to which leaders and ministers referred to scientific expertise and, when they did so, which particular disciplines they relied on.
  • Funder: German Research Council (DFG)
  • Total funding: €398,000
Read More


Contestations of the Liberal Script | Centre of Excellence - “Data and Methods Centre”

Role: Principal Investigator

  • Project goal: The Data and Methodology Center (DMC) contributes to a fruitful collaboration of scholars from a wide variety of disciplines, research traditions, and contexts. Its objective is to ensure and raise the standards of research by: 1) providing training and research consulting; 2) offering a forum for critical reflection about the concepts and methods underlying data collection; 3) discussing methodological innovations, especially those that connect quantitative and qualitative data; 4) assisting with data management and data accessibility; 5) establishing a central data portal to make the collected data available to other scholars (data archive and services) and thus contributing to a growing infrastructure of accessible social science data.
  • Funder: German Research Council (DFG)
  • Total funding: €740,000
Read More


Mixed methods for analysing what political parties promise to voters during election campaigns

Role: Co-Investigator

  • Project goal: For democracy to function effectively, political parties must offer meaningful choices to voters during election campaigns. However, as parties’ communication with voters is becoming increasingly fragmented, targeted and direct, it is becoming impossible for citizens to keep track of what different parties are promising. These new styles of campaigning are also challenging established methods for studying parties’ campaign promises. This project aims to develop innovative new methods that for the first time will enable researchers to examine the qualitative content of what parties promise in the large quantity of text and speech in election campaigns. The project includes leaders of the world’s largest research group devoted to the qualitative analysis of parties’ campaign promises. It also includes researchers who have developed new and widely used methods for the quantitative analysis of political texts, which detect patterns among words and ideas in large amounts of text. Progress in this field has been stifled by limited dialogue among the proponents of different qualitative and quantitative methods. This project will examine the strengths, limitations and theoretical implications of the full range of methods used in this field. The new methods that we will develop aim to combine they strengths of different approaches. These existing and new methods are highly relevant to the analysis of text and speech in a wide range of social science fields.
  • Funder: Bank of Sweden Tercentenary Foundation (Riksbanken Jubileumsfond):
  • Total funding: €1.1 million
  • Funding period: 1.01.20 → 31.12.22
Read More


ESRC Business and Local Government Data Research Centre

Role: Co-Investigator and Deputy Director

  • Project goal: Funded by the Economic and Social Research Council (ESRC), we aim to be the UK’s centre of choice for data research. We act as a hub of knowledge that reaches beyond Essex into a global network of experts, organisations and innovators. This ensures the far-reaching impact of our best practice models and concepts. Situated in the Knowledge Gateway of the University of Essex, we provide access to funding, training and world-leading expertise in data analytics.
  • Funder: ESRC
  • Total funding: £1.525 million (total funding including contribution from host institution £3 millions)
Read More

Lancet Countdown

The Lancet Countdown Commission

Role: Working Group 5 Co-Investigator

  • Project goal: The Lancet Countdown on health and climate change is a collaboration involving over 120 leading experts including climate scientists, engineers, economists, political scientists, public health professionals, and doctors from 35 leading academic institutions and UN agencies across the world, including the World Health Organisation, World Meteorological Organisation, World Bank, European Centre for Disease Control and Prevention, and many of the world’s leading academic institutions. The work of The Lancet Countdown on health and climate chang e is supported by the Wellcome Trust.
  • Funder: The Wellcome Trust Foundation
  • Total funding: £5 millions
Read More


Selected publications from 2018 to present. Full publication listing can be found on my CV

  • All
  • Robust NLP Models
  • Sustainable Deployment of NLP Systems in Government
Climate Change

Applying NLP Techniques to Classify Businesses by their International Standard Industrial Classification (ISIC) Code

The paper proposes a novel application of NLP techniques to classify entities by their ISIC code using business descriptions and names, and identifies DistilBERT as the best model for the task, achieving 77.9% average accuracy on a 56 label multiclass classification task.

Read More
Climate Change

The 2022 report of the Lancet Countdown on health and climate change: health at the mercy of fossil fuels

The 2022 report of the Lancet Countdown highlights that while the world is facing concurrent systemic shocks, including the COVID-19 pandemic, global energy and cost-of-living crises, and Russia's invasion of Ukraine, climate change's worsening impacts are increasingly affecting human health and wellbeing.

Read More
Climate Change

Positive, global, and health or environment framing bolsters public support for climate policies

This study conducted a conjoint experiment of 7,500 adults in five countries to identify climate messages that elicit greater support for policies to tackle climate change and found that a positive frame, health and environmental frames, and global and immediate frames increase public support, with positive and health frames being particularly effective among individuals unconcerned about climate change.

Read More
Climate Change

The German coal debate on Twitter: Reactions to a corporate policy process

This study analyzes the German coal debate on Twitter before, during, and after the session of the Coal Commission and finds that the sentiment of the debate becomes increasingly negative and polarized over time, indicating that the Coal Commission did not further consensus in the coal debate on Twitter.

Read More
Climate Change

The inclusion of health in major global reports on climate change and biodiversity

This article argues that human health has become a key consideration in recent global reports on climate change and biodiversity produced by various international organisations.

Read More
Climate Change

Overcoming the challenges of collaboratively adopting artificial intelligence in the public sector

This study investigates the challenges faced by interorganizational collaborations when adopting AI tools and implementing organizational routines to address them.

Read More
Climate Change

Do Intergovernmental Organizations Have a Socialization Effect on Member State Preferences? Evidence from the UN General Debate

We adopt a novel approach to measuring state preferences and whether intergovernmental organizations (IGOs) have a socialization effect on them by applying text analytic methods to country statements in the annual United Nations General Debate (UNGD).

Read More
Solar Panels

The 2021 report of the Lancet Countdown on health and climate change: code red for a healthy future

The Lancet Countdown is an international collaboration that independently monitors the health consequences of a changing climate. The 44 indicators of this report expose an unabated rise in the health impacts of climate change and the current health consequences of the delayed and inconsistent response of countries around the globe.

Read More

Tracking progress on health and climate change in Europe

Left unabated, climate change will have catastrophic effects on the health of present and future generations. Responding to this need, the Lancet Countdown in Europe is established as a transdisciplinary research collaboration for monitoring progress on health and climate change in Europe.

Read More

The Challenges of Organizational Factors in Collaborative Artificial Intelligence Projects in the Public Sector

By using a case study that involves a large research university in England and two different county councils in a multi-year collaborative project around AI, we study the challenges that interorganizational collaborations face in adopting AI tools and implementing organizational routines to address them.

Read More

Transfer learning for topic labeling: Analysis of the UK House of Commons speeches 1935-2014

We present a transfer topic labeling method that seeks to remedy the issues stemming from the additional step of attaching meaningful labels to estimated topics in Natural Language Processing task, using domain-specific codebooks as the knowledge base to automatically label estimated topics

Read More

Engagement with health in national climate change commitments under the Paris Agreement: a global mixed-methods analysis of the nationally determined contributions

In this study, we aimed to examine how public health is incorporated in the nationally determined contributions outlined under the Paris Agreement, and how different patterns of engagement might be related to broader inequalities and tensions in global climate politics.

Read More

Intergovernmental engagement on health impacts of climate change

We obtained the texts of countries’ annual statements in United Nations (UN) general debates to examine countries’ engagement with the health impacts of climate change in their formal statements to intergovernmental organizations, and the factors driving engagement.

Read More

The 2020 report of The Lancet Countdown on health and climate change: responding to converging crises

The world has already warmed by more than 1.2C compared with preindustrial levels, resulting in profound, immediate, and rapidly worsening health effects, and moving dangerously close to the agreed limit of maintaining temperatures “well below 2C”. These health impacts are seen on every continent...

Read More

Improving public services by mining citizen feedback: An application of natural language processing

Digital technology has created new methods of collecting user feedback where service users post comments. As topic models can analyse large volumes of feedback, they have been proposed as a feasible approach to aggregating user opinions. This novel approach has been applied to process reviews of primary care practices in England.

Read More

Managing artificial intelligence deployment in the public sector

There is a scarcity of empirical evidence surrounding the challenges and approaches to artificial intelligence deployment. Using data analytics, our study moves from speculation to gathering evidence. Our findings show that most challenges arise during implementation and relate to skills, culture, and resistance to share information driven by data challenges.

Read More

Big data to the rescue? Challenges in analysing granular household electricity consumption in the United Kingdom

Rapid growth in smart meter installations has given rise to vast collections of data. However, to enable efficient policy interventions, we need to be able to appropriately segment the population of users. The aim of this paper is to consider challenges and opportunities associated with large highly-granular temporal datasets that describe residential electricity consumption.

Read More

Intra-cabinet politics and fiscal governance in times of austerity

Why are some governments more effective in controlling spending while others fall prey to excessive overspending by individual cabinet ministers? We approach this question by lifting the veil of collective cabinet responsibility and focusing on intra-cabinet decision-making around budgetary allocation.

Read More

Power Plays and Balancing Acts: The Paradoxical Effects of Chinese Trade on African Foreign Policy Positions

This article examines whether trade with China leads African states to adopt more similar foreign policy preferences to China in the United Nations. We examine foreign policy similarity using voting patterns in the United Nations General Assembly and country statements in the United Nations General Debate.

Read More

Big Data and AI–A transformational shift for government: So, what next for research?

This study offers an in-depth review of the Policy and Administration literature on the role of Big Data and advanced analytics in the public sector. It provides an overview of the key themes in the research field, namely the application and benefits of Big Data throughout the policy process, and challenges to its adoption and the resulting implications for the public sector.

Read More

The 2019 report of The Lancet Countdown on health and climate change: ensuring that the health of a child born today is not defined by a changing climate

The Lancet Countdown is an international, multidisciplinary collaboration, dedicated to monitoring the evolving health profile of climate change, and providing an independent assessment of the delivery of commitments made by governments worldwide under the Paris Agreement.

Read More

Multiplex communities and the emergence of international conflict

Advances in community detection reveal new insights into multiplex and multilayer networks. Less work, however, investigates the relationship between these communities and outcomes in social systems. We leverage these advances to shed light on the relationship between the cooperative mesostructure of the international system and the onset of interstate conflict.

Read More

AI for SDG-16 on Peace, Justice, and Strong Institutions: Tracking Progress and Assessing Impact

The transition from the Millennium Development Goals (MDGs) to the Sustainable Development Goals (SDGs) brought with it significant changes in the process of creating the goals and with the actual content of the SDGs. We argue that better use of machine learning techniques can help address the challenges of the SDG 16 inclusion.

Read More

Artificial intelligence for the public sector: opportunities and challenges of cross-sector collaboration

Public sector organizations are increasingly interested in using data science and artificial intelligence capabilities to deliver policy and generate efficiencies in high-uncertainty environments. The long-term success of data science and artificial intelligence (AI) in the public sector relies on effectively embedding it into delivery solutions for policy implementation.

Read More



Mathematics for Data Science

This course aims to deliver a compact and tailored introduction to the core mathematical concepts of data science, including linear algebra, probability theory, statistics, and optimisation.

AI in Government

Data Structures and Algorithms

This course begins with an introduction to fundamental programming concepts, presents basic ideas in data structures and algorithms and considers how to write efficient code using established software engineering practices and paradigms.

Machine learning

Machine Learning

The course covers topics in supervised and unsupervised learning, including the most common learning algorithms for regression, classification and clustering, such as random forests, neural networks, and dimensionality reduction techniques.

NLP with Deep Learning

Natural Language Processing with Deep Learning

This course provides an overview of modern data-driven models through deep learning towards richer structural representations of how words interact to create meaning.

AI in Government

Managing Digitalisation and Artificial Intelligence in Government

This course looks beyond the hype and focus on the real challenges and opportunities of practical applications of AI for government organisations.

AI for Decision Makers

AI for Decision Makers

This course aims to demystify the concepts of artificial intelligence, machine learning and data science, highlighting their direct business and societal benefits while also considering the challenges of their deployment.


Hannah Bechara

Dr. Hannah Béchara

Postdoctoral researcher at the Hertie School

Olga Gasparyan

Olga Gasparyan, Ph.D

Postdoctoral researcher at the Hertie School

Paulina García Corral

Paulina García Corral

PhD researcher supervised under SCRIPTS


Krishnamoorthy Manohara

Research Assistant

Radwa Radwan

Radwa Radwan

Research Assistant

Aswin Jose Roy

Aswin Jose Roy

Research Assistant


Every year since 1946, representatives of the UN member states gather at the annual sessions of the United Nations General Assembly. The centrepiece of each session is the General Debate. This is a forum at which leaders and other senior officials deliver statements that present their government’s perspective on the major issues in world politics. These statements are akin to the annual legislative state-of-the-union addresses in domestic politics. See more from the UN here

This dataset, the UN General Debate Corpus (UNGDC), introduces the corpus of transcripts of General Debate statements from 1946 (Session 1) to 2022 (Session 77). Additional information is available from here

Explore the dataset

We present a database of parliamentary debates that contains the complete record of parliamentary speeches from Dáil Éireann, the lower house and principal chamber of the Irish parliament, from 1919 to 2013. In addition, the database contains background information on all TDs (Teachta Dála, members of parliament), such as their party affiliations, constituencies and office positions. The current version of the database includes close to 4.5 million speeches from 1,178 TDs. The speeches were downloaded from the official parliament website and further processed and parsed. Background information on TDs was collected from the member database of the parliament website. Data on cabinet positions (ministers and junior ministers) was collected from the official website of the government. A record linkage algorithm and human coders were used to match TDs and ministers.

Explore the dataset

All materials and data for replication of my research and work can be found on the Harvard Dataverse repository below

Explore publication replication materials


Get in touch

If you would like to contact me directly, please use the email address listed on my C.V. (linked above). I will do my best to get back to you as soon as possible, but often cannot respond to emails as quickly as I would like. Thank you!


University of Birmingham

Edgbaston, Birmingham, B15 2TT, UK