Make a data inventory

A data inventory exercise can help you to identify the range of datasets that could be used to tackle a problem and decide on which type of data infrastructure needs to be designed or strengthened to improve these data assets and access to them.

A data inventory is a list of datasets annotated with important information (known as metadata) that can help people understand why data has been collected, what it contains, how it is managed and the ways it is made available for others to use. For detailed guidance on how to create a data inventory, you can use this checklist on how to create a data inventory developed by the Centre for Agricultural Bioscience International (CABI) and the ODI with the support of the Bill and Melinda Gates Foundation.

Data inventories can be used to:

  • assess the quality of the data available

  • identify data assets that might contain personal or commercially-sensitive data

  • undertake a robust analysis of the sector data

  • find relevant data easily

  • assess the range of data owners in the initiative’s relevant sector

  • make recommendations to improve access to data.

To decide on which datasets to include, you might want to consider:

  • what kind of data you are looking for based on your problem statement

  • the geographical area you want to work in

  • how recent the publications and datasets are

  • the credibility and reliability of the data source

  • the methodological soundness of the data collection approach

  • if and how the data is licensed.

Example Data Inventory

Field

Data asset

Description

Location

License

Land Use

Land use dataset

Datasets describing cultivated areas around the world

Soil

Soil maps

Soil data describing characteristics and classes provided by ISRIC’s world soil information

Access to water

Water sources map and location datasets

EU’s open data portal datasets related to water sources to support agricultural projects

...

...

...

...

...

Data inventories are very useful for understanding available datasets. However, they will not necessarily capture everything, especially how the data has been used. Data infrastructure is not neutral; it needs to be understood in context, to better understand its limitations.

It is often useful to publish data inventories as open data, but the information they contain can go stale rapidly, as the availability of different datasets changes. We recommend adding information to give context on how and when the data inventory was created and the information it contains collected. When publishing a data inventory, you might also explain what data is in scope, what is missing, how the inventory will be maintained, and any particular legal and ethical considerations around its reuse. You should consider maintaining data inventories collaboratively with other stakeholders, or treating their creation as a one-off research exercise.

Special focus: Open data licensing

How data is licensed will affect the ability of other people and organisations to use the data. Open data licensing allows the maximum number of people to use the data without restriction. There are three standard Creative Commons licences which can be used to open data:

CC0 – No Rights Reserved. There are no restrictions on how re-users can use the data.

CC-BY – People who use the data must credit whoever is publishing it (this is called attribution).

CC-BY-SA – People who mix the data with other data have to also release the results as open data (this is called share-alike), and must attribute it.

Last updated