Emerging technologies to support health data management

Digital technologies can support a range of data management activities, including data storage, standardisation, interoperability and privacy. The five technologies described below are increasingly being used to help manage health data.

Data repositories to store and share data

Data repositories are large online data stores that enable stakeholders to upload, share and access data. They ensure data is preserved, can help track the provenance of data, and can help standardise the way data is described (known as metadata). Repositories can provide flexibility in access permissions, enabling data to be shared across the spectrum; from internal use only, to sharing with partners and specific groups, to public or open access. Examples of data repositories used in the health sector include:

  • Global Health Data Repository Finder: A catalogue of available health data repositories used for general data uploads or for specific subjects and geographies.

  • figshare: A widely accepted data repository for all types of data including qualitative social science data, as well as biomedical data. It is recognised by PLOS One journals.

  • Zenodo: Another widely accepted data repository, also commonly used by scientific journals as an accepted data repository for the data attached to published research studies.

  • Vivli: A global data research sharing platform specialising in sharing clinical trials data. This allows the results of multiple studies to be combined to create large enough datasets to perform analysis that have meaningful conclusions.

Tools to support data governance

There are an increasing number of software systems to support enterprises, health departments and others to govern data. Using methods, such as labelling and tagging to indicate important things like permissions for reuse (for example consents for personal data), standards to capture key information about each dataset to inform the way it is managed, and data dictionaries to ensure common terms, these technologies can help to implement the agreed policies and processes set out by a data governance framework. Examples of these tools include:

  • Open source data catalogs: This blog post describes some open source projects released by leading tech organisations that are sharing their internal data governance infrastructure and tooling. Select the GitHub repository link provided for each example.

  • Transcelerate: A non-profit organisation specialising in working with the pharmaceutical industry to manage and share research data.

  • EUCANConnect: The European Commission's experimental tooling platform, with privacy and data governance pilot tools available.

Tools to support standardisation

Standards for data and data models detail agreed ways to collect, use and share data, for example language, concepts, rules and guidance. These might be embedded in software, or described in documents. There are a number of standards for data that help ensure health data is interoperable, enabling use by a variety of stakeholders in a consistent, comparable manner. These include:

  • Datacite creates a digital object identifier (DOI) for each dataset which enables it to be shared with a consistent, globally standardised internet link.

  • Metadata standards help ensure datasets are described with standardised metadata descriptions. Models like schema.org and DCAT, and dataset descriptions like HCLS, enable standardised descriptions of datasets.

  • There are also standardised data models for how health data should be organised, including DHSI2, OMOP, LOINC, PCORnet, DICOM, and others.

  • Industry-based solutions like the Google Cloud Healthcare Engine seek to transform datasets into a format readable by the FHIR (Fast Healthcare Interoperability Resources) application programme interface (API) standard as they are ingested, in order to make them interoperable.

Tools to support data integration

Data can be made accessible to others via APIs. APIs enable datasets to be integrated directly into systems so that users can access data that is relevant for their purpose, rather than having to download and upload entire datasets. APIs give users access to up-to-date data, mitigating the risk that they might use an outdated version. They can also contain built-in contract conditions to manage access permissions. These controls can be set at a granular level so that different users can be given different levels of access, as determined by a unique identifier (also referred to as an API key).

APIs can be developed as needed by individuals or organisations, but there are also a number of open API standards such as the OpenAPI specification. API standards make sure that different systems using the same API can access, modify or create data items in a way that is consistent between each system.

Health-specific API standards include:

Privacy enhancing technologies

Privacy enhancing technologies (PETs) enable use of data while protecting the identity of the individuals reflected in the data. There are emerging good practices to consider when adopting PETs. PETs include:

  • Platforms that enable data to be accessed in a secure way that does not allow removal from the platform, often referred to as "trusted execution environments". For example, EUCANConnect's DATAShield project aims to enable analysis of research data without removal from the data storage source.

  • Personal data stores allow individuals to store their data in a platform and set agreements on who can access the data. This can include proprietary offerings built on blockchain technologies, like meeco.me, or health authority designed platforms like MyKanta in Finland.

Last updated