The federated learning landscape is evolving incredibly fast. This is in big part due to its fantastic community, which is pushing the boundaries of what decentralized analysis can do. Moreover, the intrinsic nature of this technology promotes collaborations across different parties.
As one of these collaborations, we are thrilled to announce a partnership between vantage6 and DataSHIELD. In case you don’t know, DataSHIELD is an R-based software package for federated analyses of biomedical, healthcare and social-science data. Established in 2013, it is an innovative, game-changing initiative that is being continuously developed by the federated learning community. The DataSHIELD team is based at Newcastle University, but development is now also spread to a community based around the EUCAN-Connect project. You can learn more about them here.
What is the purpose of this partnership?
Our objective is to leverage the best of these two solutions: we want to make the extensive DataSHIELD toolset available through vantage6. In other words, users of vantage6 will be able to use the well curated and carefully reviewed analysis methods in DataSHIELD, all while keeping the data of each party safe. Going a little bit more into detail, Fig. 1 shows the proposed general concept.
Fig. 1. Proposed concept of the synergy between vantage6 and DataSHIELD.
In this scenario, vantage6 would provide the framework for the whole project. This includes managing collaborations, authorizing users, handling computation requests, delivering algorithms, storing results, and setting up the server (if needed) and nodes. Depending on the configuration, a node could communicate with a server or with other nodes.
At its core, each node would accommodate two DataSHIELD Docker images: a client and a gateway. On one hand, the client would contain the analysis scripts (i.e., the actual algorithm used to answer the research question being investigated). It is worth mentioning that DataSHIELD has already implemented a large toolbox of relevant operations and functions. On the other hand, the gateway would be configured to access the data (through a platform such Opal or Armadillo). This way, parties could approve only Docker images that comply with their privacy requirements, ensuring that the algorithms have non-disclosive access to the data.
Please note that this is work in progress and is subject to modifications in the future. If you want to be part of the conversation or, even better, want to actively contribute, don’t hesitate to reach out!
When 1 + 1 = 3
We believe that the synergy between these two solutions will greatly benefit the federated learning field, by providing even more (open-source) options to execute their analyses. We are very excited about this partnership and really looking forward to seeing the value that these tools will bring to the community.
Frank Martin: Scientific Programmer
Arturo Moncada-Torres: Clinical Data Scientist
Gijs Geleijnse: Sr. Clinical Data Scientist
Stuart Wheater: Member of DataSHIELD team
Prof. Paul Burton: PI of DataSHIELD
Yannick Marcon: Maintainer of the OBiBa open-source projects
Prof. Dr. Morris Swertz: Head of Genomics Coordination Center and MOLGENIS / Professor of Bioinformatics
Sido Haakma: Member of MOLGENIS team
- Arturo Moncada-Torres, Frank Martin, et. al. VANTAGE6: an open source priVAcy preserviNg federaTed leArninG infrastructurE for Secure Insight eXchange. AMIA Annual Virtual Symposium Proceedings, 2020, p. 870-877.
Available on https://vantage6.ai/vantage6/
- Amadou Gaye, Yannick Marcon, et al. "DataSHIELD: taking the analysis to the data, not the data to the analysis." International journal of epidemiology 43.6 (2014): 1929-1944.
Available on https://academic.oup.com/ije/article/43/6/1929/707730
- vantage6 GitHub repository
- DataSHIELD GitHub repository
- Opal platform
Product page https://www.obiba.org/pages/products/opal/
Opal papers https://www.obiba.org/pages/publications/
- Armadillo platform