About me
I am a Chair in Data Science and Government at the University of Birmingham. This is a joint professorial appointment between the School of Government and School of Computer Science. My research focuses on Natural Language Processing (NLP) for policy and government. At the University of Birmingham, I am establishing a new Centre for AI and Government (CAIG) as part of a sustained and increasing programme of investment in data science and AI research. Roughly half of my time is seconded to the Institute for Interdisciplinary Data Science and Artificial Intelligence (IIDSAI), where CAIG is also based.
I was previously a Professor of Data Science and Public Policy at the Hertie School of Governance in Berlin. At the Hertie School I set up the Data Science Lab – a policy school centre of competence focusing on data science teaching and research. Before joining the Hertie School, I was a Professor of Public Policy and Data Science at University of Essex, holding a joint appointment in the Institute for Analytics and Data Science and Department of Government. At Essex, I served as the Chief Scientific Adviser to Essex County Council, focusing on artificial intelligence and data science in public services. I also worked at University College London and London School of Economics.
I received a PhD in Political Science from Trinity College Dublin. In the subfield of “Robust NLP” I was working on the problem of noisy labels in political text data. In the context of NLP and machine learning, noisy labels refer to inaccuracies or inconsistencies in the dataset's ground truth labels, which are used for training and evaluating models. Addressing the problem of noisy labels is crucial for building reliable and accurate NLP models, especially when dealing with large-scale datasets or user-generated content, where label noise is often unavoidable.
Selected invited talks
Lauching the Peace and Security Data Hub
October, 2021
UN World Data Forum Read more
Workshop on Computational Linguistics for Political Text Analysis
September, 2021
CPSS @ KONVENS 2021 Event page
Data Science for Data Driven Public Services
September 22, 2021
GIZ Future Forum - Data For Development Event page
Tracking the Connections Between Public Health and Climate Change
January 27, 2020
Applied Machine Learning Days at Swiss Federal Institute of Technology Lausanne, Lausanne, Switzerland Event page and recording
Complexity and Data Science: Cluster of Methods - pattern analysis, machine learning, causal inference
November 25, 2019
Helmholtz Incubator Information and Data Science Workshop, Berlin, Germany Slide deck
AI for SDG 16 on Peace, Justice, and Strong Institutions: Tracking Progress and Assessing Impact
August 11, 2019
Workshop on Artificial Intelligence and United Nations Sustainable Development Goals, IJCAI International Joint Conferences on Artificial Intelligence, Macao, China Slide deck
AI for Common Good
June 14, 2019
AI TRAPS: Automating Discrimination, Berlin, Germany Slide deck
NLP Applications in Political Science
December 12, 2018
Language and Computation Seminar Series, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK Slide deck
Essex Centre for Data Analytics - a new vision for Essex
December 05, 2018
Innovation Series - Knowledge Gateway, Colchester, UK Slide deck
Data Science and AI for Public Good: Lessons from cross-sectoral collaboration
November 27, 2018
Bringing Data To Life For Policy and Practice: The BLGDRC Conference 2018, London, UK Slide deck
Transfer Topic Labeling with Domain-Specific Knowledge Base: An Analysis of UK House of Commons Speeches 1935-2014
November 08, 2018
Center for Comparative & International Studies, University of Zurich, Zurich, Switzerland Slide deck
Text Analysis and International Organizations - Tutorial
January 22, 2018
Empirical Research on International Organizations, Lorentz Workshop, Leiden University, Leiden, Netherlands Slide deck
Data science for the public sector
October 31, 2017
The growing ubiquity of algorithms in society: implications, impacts and innovations. The Royal Society Scientific Meeting, London, UK Slide deck
Leadership
Leadership roles and selected grants awarded
The Centre for AI in Government
The Centre for Artificial Intelligence in Government (CAIG) is a research centre at the University of Birmingham, situated within the Institute for Interdisciplinary Data Science and Artificial Intelligence (IIDSAI). It’s part of a £5.5mln investment by the University to expand AI and data science in the social sciences. Our primary objective is to connect social sciences and artificial intelligence, fostering collaboration among researchers and supporting the College of Social Sciences (CoSS) in AI-related endeavours. CAIG's mission encompasses research, support, and growth, focusing on internationally recognised research programmes and the cultivation of AI-savvy decision-makers.
Our research interests centre on AI for Democracy and AI for Health and Climate Change. In the AI for Democracy, we investigate the impact of generative artificial intelligence on democratic processes. The AI for Health and Climate Change initiative aims to develop cutting-edge surveillance tools to monitor health implications of climate hazards and predict health consequences of climate change.
To support social science faculty, CAIG employs a senior research data scientist and two postdocs to help with the existing and develop new data science and AI research. We also contribute to undergraduate, postgraduate, and executive training programmes, and provide students with opportunities to participate in faculty-led research projects.
Hertie School Data Science Lab
Hertie School is the most important policy school in the EU. I was the founding director of the Hertie School Data Science Lab – a centre of competence in AI and data science. Our mission was to foster, advance and promote a new generation of policy makers comfortable with technical and governance aspects of AI and data science – part of the new Master of Data Science for Public Policy. The research programme of the Lab focused on the applications of AI and data science methods such as computer vision, natural language processing, experimental survey methods, and causal inference to substantive problems in areas including political behaviour, climate change, decision making, and public policy. Research produced by the Lab has appeared in top scientific journals including The Lancet, PNAS, and Nature, and leading machine learning conferences such as NeurIPS, ICML, and IEEE Big Data.
Learn MoreEssex Centre for Data Analytics (ECDA)
At Essex my work focused on embedding artificial intelligence and data science in public service delivery. As Chief Scientific Adviser to Essex County Council I was the University lead on the Essex Innovates programme. The aims are to make Essex a place that is an exemplar for the integration of data across public bodies; to have the skill, capability and technology to undertake predictive analytics based on high ethical standards; to have a sustainable data infrastructure; and to have the best data science capabilities in the UK to benefit our people and communities. An outcome of the Essex Innovates is the creation of an office for data analytics - ECDA - an institutionalised, long-term collaborative effort to tackle public policy issues in Essex. ECDA will deliver on its aims through a data sharing platform, research and development platform, and an analytical hub pooling the capability across the partnership.
Learn MoreResearch
My research primarily focuses on natural language processing (NLP) for policy and government applications. I explore the unique requirements and constraints of NLP in this domain, such as stakeholder involvement, data restrictions, decision-making focus, and political risks. I address two key questions in my work:
- How to make NLP models robust to domain-specific constraints?
- How to ensure their robustness upon deployment in government settings?
I investigate the robustness of NLP models by developing grounded language models using attributional and causal structure elicitation. My work in this area concentrates on health implications of climate change in citizen, corporate, and government communications. My research has been published in journals such as The Lancet, The Lancet Planetary Health, The Lancet Public Health, and The Bulletin of the WHO.
My ongoing work in this area focuses on identifying attributional links between self-reported health effects and climate change. It is funded through a €1m Horizon Europe grant (CATALYSE) to develop a public health digital surveillance system. This is part of The Lancet Countdown: Tracking Progress on Health and Climate Change scientific collaboration.
I have also contributed to the estimation of uncertainty bounds for NLP tasks using resampling and modelling approaches. These contributions have been published in prominent political science journals, including American Journal of Political Science, Political Analysis, British Journal of Political Science, Legislative Studies Quarterly, and American Political Science Review.
Additionally, I work on answering policy questions using small, heterogeneous data through transfer learning. This research has been featured in publications such as Research & Politics, IEEE Big Data, and IJCAI.
Drawing from public management and change management perspectives, I examine the critical factors for sustainable embedding of NLP systems in government decision-making processes. This work is informed by my practical experience at the University of Essex and the Hertie School of Governance in Berlin.
At the University of Essex, I served as a Chief Scientific Adviser to Essex County Council, leading NLP initiatives in education and social care delivery. At the Hertie School, I founded a data science lab that contributed to a €240 million German government programme to establish data science centres in all federal ministries.
I have collaborated with international organisations such as the European Commission, United Nations, UNDP, and World Bank on projects ranging from illicit trade detection to post-COVID recovery policies. Some of this work has been published in Philosophical Transactions of the Royal Society A, Computer, Public Policy and Administration, Energy Research & Social Science, Public Administration, and Social Science Computer Review.
Projects funded
CATALYSE
Climate Action To Advance Healthy Societies in Europe
Role: Principal Investigator - working group lead on development of innovative health surveillance and forecasting tools that facilitate effective policy response to environmental health hazards caused by climate change.
- Project goal: Despite clear signs that the impacts of climate change are escalating, the global response has been inadequate. Traditional scientific efforts have fallen short of providing knowledge and tools that have been broadly applied in decision-making, and innovative approaches to knowledge translation are needed. To catalyse climate action in Europe to protect public health, our overarching goal is to provide new knowledge, data, and tools on: i) the relationships between changes in environmental hazards caused by climate change, ecosystems, and human health; ii) the health co-benefits of climate action; iii) the role of health evidence in decision making; and iv) the societal implications of climate change for health systems.
- Funder: Horizon Europe | European Commission
- Total funding: €10.354 millions | Hertie allocation €975,000
- Funding period: 01.07.22 → 30.06.27
SCRIPTS I
Contestations of the Liberal Script | Centre of Excellence - “Leader types and Liberal Narratives of the COVID-19 Pandemic”
Role: Principal Investigator
- Project goal: This project compares decision-makers in the pandemic, with a focus on leaders, health and finance ministers. The role of these ministers is taken into account because much debate has revolved around the issues of life versus livelihoods. It considers the degree to which these persons are “experts” in the relevant policy area. It further investigates the extent to which leaders and ministers referred to scientific expertise and, when they did so, which particular disciplines they relied on.
- Funder: German Research Council (DFG)
- Total funding: €398,000
SCRIPTS II
Contestations of the Liberal Script | Centre of Excellence - “Data and Methods Centre”
Role: Principal Investigator
- Project goal: The Data and Methodology Center (DMC) contributes to a fruitful collaboration of scholars from a wide variety of disciplines, research traditions, and contexts. Its objective is to ensure and raise the standards of research by: 1) providing training and research consulting; 2) offering a forum for critical reflection about the concepts and methods underlying data collection; 3) discussing methodological innovations, especially those that connect quantitative and qualitative data; 4) assisting with data management and data accessibility; 5) establishing a central data portal to make the collected data available to other scholars (data archive and services) and thus contributing to a growing infrastructure of accessible social science data.
- Funder: German Research Council (DFG)
- Total funding: €740,000
MiMac
Mixed methods for analysing what political parties promise to voters during election campaigns
Role: Co-Investigator
- Project goal: For democracy to function effectively, political parties must offer meaningful choices to voters during election campaigns. However, as parties’ communication with voters is becoming increasingly fragmented, targeted and direct, it is becoming impossible for citizens to keep track of what different parties are promising. These new styles of campaigning are also challenging established methods for studying parties’ campaign promises. This project aims to develop innovative new methods that for the first time will enable researchers to examine the qualitative content of what parties promise in the large quantity of text and speech in election campaigns. The project includes leaders of the world’s largest research group devoted to the qualitative analysis of parties’ campaign promises. It also includes researchers who have developed new and widely used methods for the quantitative analysis of political texts, which detect patterns among words and ideas in large amounts of text. Progress in this field has been stifled by limited dialogue among the proponents of different qualitative and quantitative methods. This project will examine the strengths, limitations and theoretical implications of the full range of methods used in this field. The new methods that we will develop aim to combine they strengths of different approaches. These existing and new methods are highly relevant to the analysis of text and speech in a wide range of social science fields.
- Funder: Bank of Sweden Tercentenary Foundation (Riksbanken Jubileumsfond):
- Total funding: €1.1 million
- Funding period: 1.01.20 → 31.12.22
ESRC
ESRC Business and Local Government Data Research Centre
Role: Co-Investigator and Deputy Director
- Project goal: Funded by the Economic and Social Research Council (ESRC), we aim to be the UK’s centre of choice for data research. We act as a hub of knowledge that reaches beyond Essex into a global network of experts, organisations and innovators. This ensures the far-reaching impact of our best practice models and concepts. Situated in the Knowledge Gateway of the University of Essex, we provide access to funding, training and world-leading expertise in data analytics.
- Funder: ESRC
- Total funding: £1.525 million (total funding including contribution from host institution £3 millions)
Lancet Countdown
The Lancet Countdown Commission
Role: Working Group 5 Co-Investigator
- Project goal: The Lancet Countdown on health and climate change is a collaboration involving over 120 leading experts including climate scientists, engineers, economists, political scientists, public health professionals, and doctors from 35 leading academic institutions and UN agencies across the world, including the World Health Organisation, World Meteorological Organisation, World Bank, European Centre for Disease Control and Prevention, and many of the world’s leading academic institutions. The work of The Lancet Countdown on health and climate chang e is supported by the Wellcome Trust.
- Funder: The Wellcome Trust Foundation
- Total funding: £5 millions
Publications
Selected publications from 2018 to present. Full publication listing can be found on my CV
- All
- Robust NLP Models
- Sustainable Deployment of NLP Systems in Government
Applying NLP Techniques to Classify Businesses by their International Standard Industrial Classification (ISIC) Code
The paper proposes a novel application of NLP techniques to classify entities by their ISIC code using business descriptions and names, and identifies DistilBERT as the best model for the task, achieving 77.9% average accuracy on a 56 label multiclass classification task.
Read MoreThe 2022 report of the Lancet Countdown on health and climate change: health at the mercy of fossil fuels
The 2022 report of the Lancet Countdown highlights that while the world is facing concurrent systemic shocks, including the COVID-19 pandemic, global energy and cost-of-living crises, and Russia's invasion of Ukraine, climate change's worsening impacts are increasingly affecting human health and wellbeing.
Read MorePositive, global, and health or environment framing bolsters public support for climate policies
This study conducted a conjoint experiment of 7,500 adults in five countries to identify climate messages that elicit greater support for policies to tackle climate change and found that a positive frame, health and environmental frames, and global and immediate frames increase public support, with positive and health frames being particularly effective among individuals unconcerned about climate change.
Read MoreThe German coal debate on Twitter: Reactions to a corporate policy process
This study analyzes the German coal debate on Twitter before, during, and after the session of the Coal Commission and finds that the sentiment of the debate becomes increasingly negative and polarized over time, indicating that the Coal Commission did not further consensus in the coal debate on Twitter.
Read MoreThe inclusion of health in major global reports on climate change and biodiversity
This article argues that human health has become a key consideration in recent global reports on climate change and biodiversity produced by various international organisations.
Read MoreOvercoming the challenges of collaboratively adopting artificial intelligence in the public sector
This study investigates the challenges faced by interorganizational collaborations when adopting AI tools and implementing organizational routines to address them.
Read MoreDo Intergovernmental Organizations Have a Socialization Effect on Member State Preferences? Evidence from the UN General Debate
We adopt a novel approach to measuring state preferences and whether intergovernmental organizations (IGOs) have a socialization effect on them by applying text analytic methods to country statements in the annual United Nations General Debate (UNGD).
Read MoreThe 2021 report of the Lancet Countdown on health and climate change: code red for a healthy future
The Lancet Countdown is an international collaboration that independently monitors the health consequences of a changing climate. The 44 indicators of this report expose an unabated rise in the health impacts of climate change and the current health consequences of the delayed and inconsistent response of countries around the globe.
Read MoreTracking progress on health and climate change in Europe
Left unabated, climate change will have catastrophic effects on the health of present and future generations. Responding to this need, the Lancet Countdown in Europe is established as a transdisciplinary research collaboration for monitoring progress on health and climate change in Europe.
Read MoreThe Challenges of Organizational Factors in Collaborative Artificial Intelligence Projects in the Public Sector
By using a case study that involves a large research university in England and two different county councils in a multi-year collaborative project around AI, we study the challenges that interorganizational collaborations face in adopting AI tools and implementing organizational routines to address them.
Read MoreTransfer learning for topic labeling: Analysis of the UK House of Commons speeches 1935-2014
We present a transfer topic labeling method that seeks to remedy the issues stemming from the additional step of attaching meaningful labels to estimated topics in Natural Language Processing task, using domain-specific codebooks as the knowledge base to automatically label estimated topics
Read MoreEngagement with health in national climate change commitments under the Paris Agreement: a global mixed-methods analysis of the nationally determined contributions
In this study, we aimed to examine how public health is incorporated in the nationally determined contributions outlined under the Paris Agreement, and how different patterns of engagement might be related to broader inequalities and tensions in global climate politics.
Read MoreIntergovernmental engagement on health impacts of climate change
We obtained the texts of countries’ annual statements in United Nations (UN) general debates to examine countries’ engagement with the health impacts of climate change in their formal statements to intergovernmental organizations, and the factors driving engagement.
Read MoreThe 2020 report of The Lancet Countdown on health and climate change: responding to converging crises
The world has already warmed by more than 1.2C compared with preindustrial levels, resulting in profound, immediate, and rapidly worsening health effects, and moving dangerously close to the agreed limit of maintaining temperatures “well below 2C”. These health impacts are seen on every continent...
Read MoreImproving public services by mining citizen feedback: An application of natural language processing
Digital technology has created new methods of collecting user feedback where service users post comments. As topic models can analyse large volumes of feedback, they have been proposed as a feasible approach to aggregating user opinions. This novel approach has been applied to process reviews of primary care practices in England.
Read MoreManaging artificial intelligence deployment in the public sector
There is a scarcity of empirical evidence surrounding the challenges and approaches to artificial intelligence deployment. Using data analytics, our study moves from speculation to gathering evidence. Our findings show that most challenges arise during implementation and relate to skills, culture, and resistance to share information driven by data challenges.
Read MoreBig data to the rescue? Challenges in analysing granular household electricity consumption in the United Kingdom
Rapid growth in smart meter installations has given rise to vast collections of data. However, to enable efficient policy interventions, we need to be able to appropriately segment the population of users. The aim of this paper is to consider challenges and opportunities associated with large highly-granular temporal datasets that describe residential electricity consumption.
Read MoreIntra-cabinet politics and fiscal governance in times of austerity
Why are some governments more effective in controlling spending while others fall prey to excessive overspending by individual cabinet ministers? We approach this question by lifting the veil of collective cabinet responsibility and focusing on intra-cabinet decision-making around budgetary allocation.
Read MorePower Plays and Balancing Acts: The Paradoxical Effects of Chinese Trade on African Foreign Policy Positions
This article examines whether trade with China leads African states to adopt more similar foreign policy preferences to China in the United Nations. We examine foreign policy similarity using voting patterns in the United Nations General Assembly and country statements in the United Nations General Debate.
Read MoreBig Data and AI–A transformational shift for government: So, what next for research?
This study offers an in-depth review of the Policy and Administration literature on the role of Big Data and advanced analytics in the public sector. It provides an overview of the key themes in the research field, namely the application and benefits of Big Data throughout the policy process, and challenges to its adoption and the resulting implications for the public sector.
Read MoreThe 2019 report of The Lancet Countdown on health and climate change: ensuring that the health of a child born today is not defined by a changing climate
The Lancet Countdown is an international, multidisciplinary collaboration, dedicated to monitoring the evolving health profile of climate change, and providing an independent assessment of the delivery of commitments made by governments worldwide under the Paris Agreement.
Read MoreMultiplex communities and the emergence of international conflict
Advances in community detection reveal new insights into multiplex and multilayer networks. Less work, however, investigates the relationship between these communities and outcomes in social systems. We leverage these advances to shed light on the relationship between the cooperative mesostructure of the international system and the onset of interstate conflict.
Read MoreAI for SDG-16 on Peace, Justice, and Strong Institutions: Tracking Progress and Assessing Impact
The transition from the Millennium Development Goals (MDGs) to the Sustainable Development Goals (SDGs) brought with it significant changes in the process of creating the goals and with the actual content of the SDGs. We argue that better use of machine learning techniques can help address the challenges of the SDG 16 inclusion.
Read MoreArtificial intelligence for the public sector: opportunities and challenges of cross-sector collaboration
Public sector organizations are increasingly interested in using data science and artificial intelligence capabilities to deliver policy and generate efficiencies in high-uncertainty environments. The long-term success of data science and artificial intelligence (AI) in the public sector relies on effectively embedding it into delivery solutions for policy implementation.
Read MoreTeaching
Mathematics for Data Science
This course aims to deliver a compact and tailored introduction to the core mathematical concepts of data science, including linear algebra, probability theory, statistics, and optimisation.
Data Structures and Algorithms
This course begins with an introduction to fundamental programming concepts, presents basic ideas in data structures and algorithms and considers how to write efficient code using established software engineering practices and paradigms.
Machine Learning
The course covers topics in supervised and unsupervised learning, including the most common learning algorithms for regression, classification and clustering, such as random forests, neural networks, and dimensionality reduction techniques.
Natural Language Processing with Deep Learning
This course provides an overview of modern data-driven models through deep learning towards richer structural representations of how words interact to create meaning.
Managing Digitalisation and Artificial Intelligence in Government
This course looks beyond the hype and focus on the real challenges and opportunities of practical applications of AI for government organisations.
AI for Decision Makers
This course aims to demystify the concepts of artificial intelligence, machine learning and data science, highlighting their direct business and societal benefits while also considering the challenges of their deployment.
Team
Dr. Hannah Béchara
Postdoctoral researcher at the Hertie School
Olga Gasparyan, Ph.D
Postdoctoral researcher at the Hertie School
Paulina García Corral
PhD researcher supervised under SCRIPTS
Krishnamoorthy Manohara
Research Assistant
Radwa Radwan
Research Assistant
Aswin Jose Roy
Research Assistant
Resources
Every year since 1946, representatives of the UN member states gather at the annual sessions of the United Nations General Assembly. The centrepiece of each session is the General Debate. This is a forum at which leaders and other senior officials deliver statements that present their government’s perspective on the major issues in world politics. These statements are akin to the annual legislative state-of-the-union addresses in domestic politics. See more from the UN here
This dataset, the UN General Debate Corpus (UNGDC), introduces the corpus of transcripts of General Debate statements from 1946 (Session 1) to 2022 (Session 77). Additional information is available from here
Explore the datasetWe present a database of parliamentary debates that contains the complete record of parliamentary speeches from Dáil Éireann, the lower house and principal chamber of the Irish parliament, from 1919 to 2013. In addition, the database contains background information on all TDs (Teachta Dála, members of parliament), such as their party affiliations, constituencies and office positions. The current version of the database includes close to 4.5 million speeches from 1,178 TDs. The speeches were downloaded from the official parliament website and further processed and parsed. Background information on TDs was collected from the member database of the parliament website. Data on cabinet positions (ministers and junior ministers) was collected from the official website of the government. A record linkage algorithm and human coders were used to match TDs and ministers.
Explore the datasetAll materials and data for replication of my research and work can be found on the Harvard Dataverse repository below
Explore publication replication materialsContact
Get in touch
If you would like to contact me directly, please use the email address listed on my C.V. (linked above). I will do my best to get back to you as soon as possible, but often cannot respond to emails as quickly as I would like. Thank you!
Location:
University of Birmingham
Edgbaston, Birmingham, B15 2TT, UK