# Make a data inventory

**A data inventory exercise can help you to identify the range of datasets that could be used to tackle a problem and decide on which type of data infrastructure needs to be designed or strengthened to improve these data assets and access to them.**

A data inventory is a list of datasets annotated with important information (known as metadata) that can help people understand why data has been collected, what it contains, how it is managed and the ways it is made available for others to use. For detailed guidance on how to create a data inventory, you can use this [**checklist on how to create a data inventory**](https://www.datasharingtoolkit.org/wp-content/uploads/2021/03/CABI-Mod7-3-01-Checklist.pdf) developed by the Centre for Agricultural Bioscience International (CABI) and the ODI with the support of the Bill and Melinda Gates Foundation.&#x20;

**Data inventories can be used to:**

* assess the quality of the data available&#x20;
* identify data assets that might contain personal or commercially-sensitive data
* undertake a robust analysis of the sector data
* find relevant data easily&#x20;
* assess the range of data owners in the initiative’s relevant sector
* make recommendations to improve access to data.

**To decide on which datasets to include, you might want to consider:**

* what kind of data you are looking for based on your problem statement
* the geographical area you want to work in
* how recent the publications and datasets are
* the credibility and reliability of the data source
* the methodological soundness of the data collection approach
* if and how the data is licensed.

## Example Data Inventory

| **Field**           | **Data asset**                              | **Description**                                                                                 | **Location**                                                                                  | **License**                                                                                              |
| ------------------- | ------------------------------------------- | ----------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- |
| **Land Use**        | **Land use dataset**                        | <p><strong>Datasets describing cultivated areas around the world</strong><br></p>               | [**EarthStat**](http://www.earthstat.org/)                                                    | <p><a href="https://creativecommons.org/licenses/by/4.0/"><strong>CC-BY 4.0 License</strong></a><br></p> |
| **Soil**            | <p><strong>Soil maps</strong> <br></p>      | **Soil data describing characteristics and classes provided by ISRIC’s world soil information** | [**SoilGrids** ](<https://soilgrids.org/ >)                                                   | [**CC-BY 4.0 License**](https://creativecommons.org/licenses/by/4.0/)                                    |
| **Access to water** | **Water sources map and location datasets** | **EU’s open data portal datasets related to water sources to support agricultural projects**    | [**Waterbase - Water Quality (EU)**](https://data.europa.eu/euodp/en/data/dataset/DAT-163-en) | [**CC-BY 4.0 License**](https://creativecommons.org/licenses/by/4.0/)                                    |
| **...**             | **...**                                     | **...**                                                                                         | **...**                                                                                       | **...**                                                                                                  |

**Data inventories are very useful for understanding available datasets.** However, they will not necessarily capture everything, especially how the data has been used. Data infrastructure is not neutral; it needs to be understood in context, to better understand its limitations.

**It is often useful to publish data inventories as open data**, but the information they contain can go stale rapidly, as the availability of different datasets changes. We recommend adding information to give context on how and when the data inventory was created and the information it contains collected. When publishing a data inventory, you might also explain what data is in scope, what is missing, how the inventory will be maintained, and any particular legal and ethical considerations around its reuse. You should consider [**maintaining data inventories collaboratively**](https://collaborative-data.theodi.org/) with other stakeholders, or treating their creation as a one-off research exercise.\
&#x20;<br>

{% hint style="info" %}
**Special focus: Open data licensing**

How data is licensed will affect the ability of other people and organisations to use the data. Open data licensing allows the maximum number of people to use the data without restriction. There are three standard Creative Commons licences which can be used to open data:

[**CC0**](https://creativecommons.org/share-your-work/public-domain/cc0/) **– No Rights Reserved. There are no restrictions on how re-users can use the data.**

[**CC-BY**](https://creativecommons.org/licenses/by/2.0/) **–  People who use the data must credit whoever is publishing it (this is called attribution).**

[**CC-BY-SA**](https://creativecommons.org/licenses/by-sa/2.0/) **– People who mix the data with other data have to also release the results as open data (this is called share-alike), and must attribute it.**
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://open-data-institute.gitbook.io/data-landscape-playbook/play-four-describe-the-data-infrastructure/make-a-data-inventory.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
