The LF AI & Data Foundation takes another open source project under its wing, which first has to prove itself in the incubator: Substra. The framework is aimed at data scientists and machine learning specialists who want to handle distributed, cross-team and cross-company ML projects without having to forego the confidentiality of their respective data sets.
For trustful cooperation
Substra is a development based on the US company Owkin which specializes in the use of artificial intelligence in medical research. The focus here is on federated machine learning based on decentralized data sets, which meets the increased data protection requirements in medicine. Substra gives data scientists the functions they need to set up federated ML learning projects without having to provide their data “outside their own firewall”.
Substra can be used with all common ML frameworks to apply and monitor your own ML algorithms on remote data sets – both for testing and for forecasting. Conversely, researchers can make their own data sets available to other users via detailed, configurable release rules. In these scenarios, Substra prevents both the insight into “foreign” data and its download.
Share ML models, share data
The framework builds on the Distributed-Ledger-Technologie (Blockchain) and thereby also opens up the collaboration of competing teams on a common, virtual data pool – without any restrictions on data protection. Substra can be flexibly configured for different use cases and also ensures transparency through a continuous, unchangeable audit that registers all operations carried out on the platform. In this way, the ML models of comprehensive projects can even be certified if necessary.