Biochemical reaction networks represent complex cellular regulatory mechanisms. These networks are typically analyzed using discrete stochastic simulation models. The models typically involve numerous reactions involving a large number of chemical species, governed by highly uncertain parameters.
Likelihood-free parameter inference
Given existing data pertaining to a biochemical reaction network, one is often interested in inferring the values of the model parameters that likely generated the data. The data itself may come from models simulated in the past, or physical experiments. Approximate Bayesian Computation (ABC) is a proven approach that effectively solves such parameter inference problems by using simulation models as a tool to find the region in the parameter space corresponding to least deviation from given data.
The rejection sampling algorithm forms the basis of the ABC framework. Samples are drawn from a specified prior distribution, and subsequently simulated. The simulated responses are compared to existing data by means of a distance function and appropriate summary statistics. Samples that result in distance function values below a specified tolerance threshold are accepted, and the rest rejected. The sampling algorithm proceeds until the desired number of accepted samples have been obtained. The inferred parameters are then reported as the mean parameter values corresponding to the accepted samples.
Design choices such as selection of distance functions, summary statistics and acquisition function for the inference process have a deep impact on the solution quality. Furthermore, increasing problem complexity often leads to impractically high inference times using rejection sampling.
Our research explores methods to accelerate high-quality parameter inference by leveraging state-of-the-art methods from the fields of computational biology, machine learning, optimization and statistics. Some of our active research topics include investigating intelligent construction of priors, methods for automated large-scale summary statistic selection, and training fast local and global approximations or surrogate models of computationally expensive simulators.
The exploration of a system described by a non-linear, high-dimensional and stochastic computational model is a fundamental problem in all scientific disciplines relying on modeling and simulation. In this project we are interested in the scenario where a modeler has no or very limited prior knowledge about what type of qualitative interesting behavior the model can display over the large parameter space. The tools we develop should help the modeler discover those behaviors with a small computational budget, and as little manual work as possible. By utilizing human-in-the-loop machine learning we are developing a smart parameter sweep workflow. An example is shown in the image below, where a high-dimensional parameter sweep application is augmented with automated feature extraction and clustering, followed by training a model for classification based on user-defined labels (such as interesting or non-interesting realizations). With this model, the smart sweep application will learn to more efficiently explore areas of interestingness in the parameter space.