Kalendarium
05
June
Master's Thesis presentation: "Predicting Antibody Developability: Machine Learning Meets Therapeutic Antibodies"
Authors: Josephine Höjding and William Björkhem Supervisors: Mikael Nilsson, Centre for Mathematical Sciences Morten Krogh, Bionamic AB Examiner: Karl Åström, Centre for Mathematical Sciences
Abstract
Antibody developability refers to an antibody’s suitability for clinical use, including properties such as solubility, stability, and aggregation. These traits are traditionally assessed through experimental screening, which is time-consuming and resource heavy. Machine learning offers a promising alternative for early prediction of developability, though many existing models are still in early stages.
This work compares multiple machine learning strategies for predicting protein solubility, a key developability factor. Five datasets were used: four consisting of non-antibody protein sequences expressed in E. Coli with solubility labels, and one independent antibody dataset without labels. Three existing models—NetSolP, SWI, and ProteinSol—were evaluated using standard performance metrics, and new models were developed by leveraging feature extraction from SWI and ProteinSol to explore potential improvements.
Developed approaches included logistic regression for direct solubility prediction, models that first classified a sample’s likely dataset of origin before applying a corresponding solubility model, clustering-based methods with cluster-specific classifiers, and multi-layer perceptrons to test the benefits of deeper architectures.
Overall, the models achieved similar performance, with no single approach consistently outperforming others. Simpler models like logistic regression often performed on par with more complex models such as multi-layer perceptrons. Results varied by dataset, with the lowest performance observed on the largest and most diverse dataset, PDBSol, suggesting that high variability in sequence data may reduce prediction reliability.