Step 3. Consider how technology can facilitate data sharing and access

Alongside agreements for sharing data, technology can help curate and manage access in accordance with the conditions set out in the licence or contracts. See Box 5 for an example of this.

When considering whether a new system or interface can help in a project, it is important to be driven by the needs of the users and the principles set out in the data governance framework, rather than the features of the technology product.

There are common uses of technology to facilitate secure data sharing and access:

  • To help users find data. When combined with open licences, data repositories can be used to publish key details about datasets (known as metadata) and make these discoverable through search engines.

  • To improve access to data. Access to data can be improved through application programming interfaces (APIs), which are often used as a way to access datasets. An API is a connector that can link two or more systems together. Datasets that are made available via an API have the advantages of large downloads as they can remain constantly in sync with the original dataset, and check for any updates of the data as queries are being conducted and the data is being analysed. This avoids the risk of working with an outdated version of the dataset. Also, given the large size of many health datasets, an API allows opportunities to query or filter the part of the dataset that is needed, rather than requiring all users to download the complete dataset. Access to data can also be improved through a data repository, data library or data archive where data can be stored and conditions set to control access.

  • To facilitate use of data. Data platforms provide a greater range of features for sharing and using data, including sandboxes (see below). Platforms tend to be more resource-intensive to set up, and often require some internal engineering expertise to help build the components. Data providers making use of platforms may need to devote resources to maintaining the data platform, including ensuring it remains performant on an ongoing basis.

  • To support innovation. Data sandboxes are emerging technologies that allow access to data within a secure, controlled environment, where data can be analysed or used within the sandbox but cannot be downloaded or removed from the secure environment.

We are seeing emerging models that give individuals greater control over their personal data, the ability to use data about them for their primary healthcare and to contribute altruistically to healthcare research or initiatives. These models include technological and regulatory approaches such as:

The Appendix to this play includes some helpful tips on choosing technologies such as data repositories and data platforms.

Key questions to consider:

  • What are the user needs for access, use and sharing of data?

  • Is technology needed to support these needs?

  • Does something suitable already exist (for example through project partners)?

  • What budget and resource do you have, and does this include ongoing maintenance/ access?

Box 5: How technology can facilitate data sharing

GISAID

GISAID is a non-profit, public-private partnership. The platform, which focuses on genome sequencing, allows variant tracking, and is used by governments to inform outbreak disease response, and by the healthcare industry to shape vaccine design and therapeutic interventions.

The platform uses standards to ensure data can be compared and aggregated, and provides a range of data visualisation and dissemination tools. One critical element to the success of the GISAID initiative is its ability to provide for a fair and transparent, as well as a verifiable and unbiased mechanism not only to govern, but to take measures and guard against bias in decision-making, preserving scientific independence.

Over 190 countries have shared genomic sequences of COVID-19 variants on the GISAID platform. To date, two million genomes have been catalogued. The platform is the most trusted for data sharing of COVID-19 genome data, with African and South American scientific contributions to the platform more than doubling between January and April 2021.

Last updated