The GDB seeks to fill data gaps, generate insights, and support debate about our data futures. It will produce a benchmark study to support cross-country comparison and learning. And it will provide rich primary data to fuel sector and issue-specific data-related initiatives.
Data is shaping our world and debates on data issues surround us, yet we lack a clear view on how data policies and practices are unfolding in different countries, regions, and sectors. GDB is bringing together a research network that will work with a common methodology to survey laws and policies, data availability, data capabilities, and data use across a range of settings.
Our goal is to provide longitudinal data, to follow the success of the Open Data Barometer in providing benchmarks and evidence that can drive policy making, open dialogue, and further empirical research. We have a particular emphasis on improving understanding of the data landscape in low- and middle-income countries and understanding of data for development.
The GDB is a multi-dimensional index, built of components and sub-components which are in turn built from composite indicators that combine primary and secondary data.
This means that, in order to develop an overall comparative assessment of the extent to which countries (or regions) govern and use data for the public good, we break the concept of ‘data for the public good’ into various individual components and sub-components, each assessed separately. We then aggregate the results into an overall score.
Much of the value of a multi-dimensional index like GDB lies not in its overall scores, but in the opportunities it offers to compare countries’ relative strengths and weaknesses at the component or sub-component level. For example, the GDB will make it possible to explore which countries have good data availability but limited capability to make use of that data; or, within GDB’s governance component, the relative strength of data protection and data-sharing policies—and whether these strike the balance required to support public good outcomes in a datified society.
While we’ll be refining the exact structure of the GDB over the coming months, our working model is shown in the diagram below.
The first two components, Governance and Capability, will combine country-wide indicators that capture general features of the data landscape, with indicators that look at governance and capability in the context of specific sectors at a national level. The Availability and Use & Impact components will draw more heavily (though not exclusively) on thematic indicators, investigating the availability of specific representative datasets as well as particular use-cases for open data, data sharing, and data-protection practices.
Our predecessor study, the Open Data Barometer, focused solely on improving the availability and impact of open datasets. In the Global Data Barometer, we recognise a more complex landscape—one where policy must combine good governance of data with ongoing efforts to promote data re-usability, and to secure the benefits that open public data infrastructures can bring. We aim to provide policymakers and advocates with tools to navigate this landscape.
The sub-components of our study variously speak to:
- governing data for the public good
- equipping a country to use data for the public good
- providing (open) data for the public good
- using data for the public good
Our working definition of public good unpacks these different issues, recognising that public good itself should be understood as a dynamic and democratic concept:
Data is a source of power. It can be exploited for private gain, and used to limit freedom, or it can be deployed and governed for the public good: a resource for tackling health, social and environmental challenges, enabling collaboration, driving innovation and improving accountability.
What constitutes the public good can ultimately only be determined through open public debate: and different ‘publics’ may have different priorities at different times. We draw on established global principles, and in particular on the Sustainable Development Goals, the international human rights framework to give content to the idea of the public good used within the Global Data Barometer.
Ensuring that data is a resource for the public good involves a variety of interventions, depending both on the data in question, and the wider context. In some cases, it involves robust governance and control of data, so that risks of data abuse are minimised. In other cases, it involves promoting and supporting data re-use, to maximise the social value generated from data. In others again, it requires capacity building to equip diverse actors to work with data. In many cases, it involves all of these.
Data for the public good cannot be studied in the abstract. Data is always about something. The appropriate ways to govern, provide, and use any dataset will depend on the people, problems, and potential that it relates to.
This awareness grounds GDB’s thematic approach: We look to identify exemplar categories of data within broad thematic areas. We then study to what extent policies provide for well-managed data; to what extent capacity exists to use that data; whether the data is provided as shared or open data; and whether the data is being used in ways that we anticipate can contribute to the public good.
Our thematic lens is both practical and strategic: we look for examples of data that can reasonably be expected to exist in some form across all countries and for which there are articulated norms as to what good data should look like. A number of our thematic modules are developed with partners who work on data issues in a particular sector. By collaborating with these partners, we are able to access their expertise on dataset definitions and make sure that the evidence we gather can be mobilised to support governments and other stakeholders in moving forward reforms to improve data governance, capability, availability, and use.
Three key elements of the GDB set it apart from other studies:
- Our primary data collection—we work with a network of independent expert researchers across more than 100 countries.
- Our global lens—we work to capture insights and bright spots from across the world, and to ensure full representation of low- and middle-income countries in both the design of our methodology and the collected data.
- Our specific focus on data—we address gaps in existing work that make it difficult to assess the extent to which countries are establishing data infrastructures that can support the public good.
On this last point, our goal is to build on and extend existing work. For example, there are many existing studies that look at thematic laws, regulations, and even information availability. For example:
- The World Bank’s Benchmarking Public Procurement Data and Doing Business indices provide detailed information on procurement rules and information publication, and company registration processes respectively;
- The Open Budget Survey details the openness of key budget documents across 117 countries; and
- The IDEA Political Finance Database can tell you whether a country has rules on the publication of political party funds.
The value that the Global Data Barometer adds is to ask specifically about data. That is:
- Do legal or regulatory frameworks explicitly provide for information to be collected, managed, and made available as structured data?
- Is information actually being provided as structured shared or open data?
- Is that data being used?
Wherever possible, we take these existing studies as our starting point and then layer analysis through a data lens on top.
Our final country coverage will be determined through consultation with our regional research hubs.
We’ve designed the primary indicators in the Global Data Barometer survey to embrace what we learned across the editions of our predecessor project, Open Data Barometer.
We frame most indicators around a to what extent? question, scored on a 10-point scale. Typically that score consists of three parts:
Existence—checking whether certain frameworks, provisions, or categories of data exist in the country.
For example, a governance question may look at whether there is a data protection framework, awarding 1 point for the presence of any form of framework and 2 points for a framework with the force of law.
Elements—checking the specific elements a framework, provision, or dataset has; each element is recorded separately and then summed to create an overall score.
For example, a data availability question may look at whether the data is machine-readable, whether it’s openly licensed, and whether it contains key fields specific to a theme, such as corporate identifiers in a procurement dataset.
Extent—checking whether what’s being evaluated demonstrates comprehensive coverage or limitations.
For example, a capability question might ask whether a country’s data science training is available across the country or only in a limited number of locations. A data availability question might ask whether a dataset that scores highly in the elements checklist is an exceptional outlier in a federal system, or an example of the norm.
The screenshots below show two examples of what this will look like in our survey tool: a draft governance question and a draft dataset availability question.
This design of indicators responds specifically to feedback on the Open Data Barometer that underscored the importance of investigating not only the presence and technical features of datasets, but what policies and datasets contain. It also seeks to provide more nuance for the analysis of federal systems.
While we maintain that, from the perspective of users, it’s important to have access to data from across a whole country, we also seek to be sensitive to cases where sub-national units are at different levels of development in their data infrastructures than their national counterparts.
The GDB builds on learning from the ODB, and while the GDB updates both the set and structure of indicators used, we are working carefully to maintain broad comparability between these two studies.
We anticipate being able to create an Open Data sub-index of the GDB that can be treated as a continuation of the ODB time series.