We have opportunities for a number of MSc thesis students to work with the group in the spring semester 2020.
Artificial intelligence is rapidly transforming our society. Machine learning models will be in every digital system we use, and it is imperative that we protect the integrity of data owners. In this project we work on training schemes, scalable implementations, and applications of Federated Learning – a recent approach to training ML models while keeping input data privacy of data owners.
Federated Machine Learning
Federated machine learning has recently attracted a lot of attention both in industry and academia. Simply speaking, training proceeds by model updates on private data nodes, then weights are averaged by a server forming a global model (schematic figure inline). While simple in concept, care needs to be taken to balance local model training with global synchronization to avoid poor convergence and to minimize communication rounds. FedML differs from standard distributed learning/optimization in that data cannot be assumed to be balanced across nodes, data may not be i.i.d., and we cannot assume consistent node uptime nor low-latency high-throughput networking between nodes. During 2017 and 2018, Google Research presented an approach to FedML based on TensorFlow targeting mobile devices [2,3]. Other prominent efforts include the open source project OpenMinded (https://www.openmined.org/) and the latest API extension of Tensorflow federated . Intel in collaboration with the University of Pennsylvania recently demonstrated a real-world case for FedML based on biomedical imaging . Machine learning models that has been demonstrated in the FedML case include CNNs, LSTMs and conformal predictors . In our group we are currently working on various aspects of FedML such as new federated ensemble methods and schemes to measure individual member contributions in a scalable fashion.
Potential thesis topics
We have opportunities for MSc thesis students in a number of areas in privacy-preserving learning, such as:
- Performance evaluation and optimization of federated learning algorithms for new application areas and/or models.
- Development of new FedML schemes.
- Development of scalable computing backends.
- Decentralized implementations to enable FedML without a trusted-third party.
- Privacy-enhancing techniques such as differential privacy and secure multiparty computation.
The work will be conducted as part of the research group Integrative Scalable Computing Laboratory. ISCL is an interdisciplinary team working on the interface of scientific computing, machine learning and distributed systems. The group runs a number of eScience projects with funding from eSSENCE, SSF, VR and NIH. The MSc student will get the opportunity to participate in the work of the group during the semester the thesis is written, gaining insight into the academic work culture.
Reach out to Andreas Hellander or Salman Toor to discuss opportunities:
- Feng X., Qing, K., Meyer CH. and Chen Q., Deep convolutional neural network for segmentation of thoracic organ-at-risk using cropped 3D images, Med. Phys., 46(5), 2019.
- Konečný J,, Brendan McMahan H., X. Yu F., Richtárik P.,, Theertha Suresh A., Bacon D., Federated Learning: Strategies for imporving communication efficiency, ArXiv 1610.05492, 2016.
- K. Bonawitz et al., Towards Federated Learning at Scale: System Design, ArXiv 1902.01046, 2019.
- Tensorflow federated, https://www.tensorflow.org/federated
- Sheller MJ, Reina GA, Edwards B, Martin J, Bakas S., Multi-institutional Deep Learning Modeling Without Sharing Patient Data: A Feasibility Study on Brain Tumor Segmentation,, Lecture Notes in Computer Science book series (Volume 11383). 2019.
- Gauraha, N. and Spjuth, O. Synergy Conformal Prediction DiVA preprint. 360504 (2018). URL: urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-360504.
- How to Backdoor Federated Learning, E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, V. Shmatikov, ArXiv, 1807.00459, 2017