Statistical Inference and Stochastic Modelling of Protein Folding
This project is currently ongoing and is supported by a grant from the Swedish Research Council (VR project id is 2013-5167)
In this research project a specific biological phenomenon, known as “protein folding” is studied. Protein folding is a spontaneous process that transforms a disordered polymer into a specific three-dimensional structure. The correct three-dimensional structure is essential to function, although some parts of functional proteins may remain “unfolded”. In addition to this central role in biology, protein folding is also associated with a wide range of human diseases. In Cystic Fibrosis, for example, mutations result in faulty protein folding and hence lack of functional protein. In many neurodegenerative diseases, such as Alzheimers disease, proteins “misfold” into toxic protein structures. Because of its biological importance, the understanding of protein folding has received enormous interest, both from experiments, theory and simulations, and substantial progress has been made in our understanding of this complex process. There still remain many unanswered questions on how they fold, and we still do not have a general and quantitative theory that describes protein folding.
Mathematical modelling offers a simplified, yet manageable, representation of biological reality. Protein folding, same as all biological phenomena, requires its own inherently “noisy” kinetics to be taken into account by a mathematical model, as its modelled dynamical features result in erratic behaviours which appear as random. However quantitative tools are available to recognize a “structure” in such apparently chaotic patterns, and extract useful information. Such methods are collectively known as “stochastic modelling”.
Computer-assisted molecular dynamics simulations of protein folding will be performed in this research project and a simplified mathematical representation for such simulations will be developed, using the theory of stochastic differential equations (SDEs). SDEs are powerful mathematical tools to represent phenomena whose dynamics appear to be random. We will develop new SDE models able to represent the protein folding dynamics across multiple dimensions, going beyond their current low-dimensional mathematical representation.
In addition to the development of new mathematical models, it is necessary to be able to estimate statistically a series of unknown quantities whose values are typically unknown. Performing statistical inference for SDE models is therefore necessary but also very complicated, as we aim at considering multidimensional models representing dynamics whose exact values cannot be recorded, as these are affected with measurement errors. Such complications require the use of computationally intensive statistical methodology. We will therefore develop suitable statistical methods, such as likelihood-free methodologies: some examples are approximate Bayesian computation (ABC), synthetic likelihoods and pseudo-marginal Markov chain Monte Carlo methods.
Therefore, in summary, the aim of our project is to develop suitable mathematical models for problem folding kinetics and the statistical methods enabling the estimation of unknown features from recorded data.