A vision for a unified cancer imaging data infrastructure

At the HDI Council, we recognize initiatives like the European Cancer Imaging Initiative for their ambitious effort to advance healthcare innovation and AI utilisation. EUCAIM serves as an exemplary model of collaboration between data users and providers, aimed at sharing cancer-related imaging data. It tackles the issue of fragmentation in existing repositories by establishing a unified infrastructure and streamlining the storage and access of vital cancer data.

EUCAIM is a comprehensive project and it aims to revolutionize the landscape of medical imaging data in the European Union. 76 partners, backed up by €18 million funding, collaborate to deploy a pan-European digital federated infrastructure of FAIR cancer-related de-identified images. This infrastructure preserves data sovereignty while providing a platform for the development and benchmarking of AI tools towards precision medicine.

At the heart of EUCAIM lies a dual mission:

  1. to establish a unified infrastructure for storing and accessing cancer imaging data
  2. to address the fragmentation in existing repositories

It seeks to bridge the gap between diverse datasets scattered across various projects and institutions, enabling seamless collaboration and research endeavors.

The infrastructure features a hybrid hub-and-spoke architecture, federated at the ends with a central node where the computation is carried out. This model allows data sharing while preserving governance, privacy and security, in alignment with the European Health Data Space framework. The whole architecture is built in adherence to the FAIR principles: Findable, Accessible, Interoperable, and Reusable.
For data sharing, partners and data donors sign agreements and can decide the type of data compliance in a tiered approach, according to whether they only want to send their data, share it or transfer it.

EUCAIM does not only collect datasets but also tools for data processing and analysis curated and shared among researchers and partners. Their online platform provides a federated catalog where it is possible to browse the available datasets and a marketplace to search for processing tools. This ecosystem facilitates collaboration and innovation, empowering stakeholders with advanced algorithms.

All datasets ingested in the platform go through a series of quality checks and all algorithms get validated across several datasets before entering the marketplace. Ongoing work is set to guarantee the highest possible interoperability. Interestingly, EUCAIM assists the data providers  by  offering : a set of guidelines for the data format, scripts to conform the data to the required structure, support in running such scripts and, for organizations that do not have computing clusters, some temporary computational resources.

Such a huge project can provide valuable guidance for other organizations or consortiums that want to bring together multiple medical data sources under a unified infrastructure. EUCAIM teaches us the importance of:

  • following a robust set of guidelines and standards on how to organize the data and metadata, starting from variable names, conventions, fields, vocabularies, and relationships between the databases;
  • having a robust data quality control system. Especially for images, running quality checks on their formats and annotations, making sure to define a proper methodology, a set of rules for taking care of blurry or corrupted images, or annotations in different formats.

The vision of EUCAIM extends beyond healthcare—it strengthens Europe’s position in digital health and AI, promoting cross-border cooperation while upholding data sovereignty. We are going to follow its evolution throughout its course and wish for a future where data from all known diseases can be studied collaboratively to drive innovation and improve outcomes for patients.