Artificial intelligence is rapidly transforming our society. Machine learning models will be components in nearly every digital system we use. For this reason, there is an urgent need for methods and software that allows for development of state-of-the art ML models while protecting the integrity of data owners .
In this project, which started in 2019, we work on algorithms, highly scalable implementations, and applications of Federated Learning (FedML) – an approach to training ML models while keeping input data privacy of data owners.
FEDn – a software framework for scalable federated learning
We approach the project from a distributed computing perspective, and the overarching goal with FEDn is to provide a highly scalable, robust, resilient and secure framework that can, depending on deployment-level tuning, effectively handle both cross-silo and cross-device use-cases. We propose a highly scalable architecture drawing on the Actor model and utilizing hierarchical aggregation capabilities for horizontal scalability.
Another central goal of FEDn is a ML-framework agnostic backend. For this reason, we treat local model training in a black-box manner. The performance-utility tradeoff from this design objective is one of the questions we address in this project.
Security and trust in FedML
When working on a federated machine learning model in a setting with several different actors, there is a challenge to trust that the model generated is secure, maintain full data privacy and is not misused by anyone in the group. In this project we are working on integrating blockchain technologies in the FEDn platform to enable fully decentralized, trust-less construction of FedML models.
We are pursuing research on improved algorithms for FedML. For example, based on our work and on FEDn and our proposed architecture, we are working on highly scalable implementations of Secure Gradient Boosting. We are also pursuing improved performance in cross-device use-cases based on transfer learning. Another line of research are meta-models / ensembles in a federated setting. Together with our collaborator Ola Spjuth, we look into federated conformal prediction. Another area of interest is scalable measurement of client contributions to the federated model.
Biomedical image processing
Biomedical image processing is an important area where privacy concerns are prohibiting pooling of data to train machine learning models. Federated learning can be used to overcome this problem, but with models and datasets being large, it is important that we seek training strategies that minimizes the number of training rounds.
In this project we are collaborating with Fredrik Löfman’s group at RaySearch Laboratories on applications of FedML to 3D segmentation problems. This is collaboration is funded via the eSSENCE collaboration on eScience.
Community-driven federated machine learning framework for cloud operators
In this project the aim is accurate predictive modeling of resource usage parameters in a typical data center environments. Given such models, it is possible to throttle resources intelligently based on predicted demands. This can lead to substantial savings in operational cost and greener computing. We have proposed a FedML-based solution to let data center operators easily pool together resource consumption patterns in a privacy preserving setting, and benefit from shared knowledge. We are now investigating the integration of finer-level information including application-level usage patterns. Also, the next step is to leverage the models to solve optimization problems where an operator can specify scenarios such as, how can we optimally allocate resources so that all service level agreements are met, but our electricity bill does not exceed X units?
In this project we are using data from two academic cloud providers, the SNIC Science Cloud in Sweden, and CSC in Finland.
Morgan Ekmefjord, Desislava Stoyanova, Ola Spjuth, Salman Toor, and Andreas Hellander, FEDn – A scalable framework for federated machine learning (manuscript in preparation)
Prashant Singh, Mona Mohamad Elamin and Salman Toor, Towards Smart e-Infrastructures, A Community Driven Approach Based on Real Datasets, in Proceedings of the IEEE GreenTech Conference, 2020 (accepted)
Felix Morsbach, Hardened Model Aggregation for Federated Learning backed by Distributed Trust Towards decentralizing Federated Learning using a Blockchain, MSc. thesis, 2020.
Meenal Pathak, Mohamed Hussein, Studying Data Distribution Dependencies In Federated Learning, 2020.
Jiaong Liang, Federated Learning for Bioimage Classification, Msc thesis, 2020.
Mona Babikir Abdelhamid Mohamed Elamin, Machine Learning for Cloud: Modeling Cluster Health using Usage Parameters, MSc thesis, 2019.