The Supercomputer MACH-2: Use Cases

[Back to Use Cases]

Use Case: Optimal Bayesian design for model discrimination via classification

Scientific Groups and Collaborations Description of the Application

Before actually conducting a statistical experiment, one might be interested in determining at which setting of the controllable factors of the experiment the expected gain of knowledge from the experiment is optimal. Design of experiments is concerned with quantifying the information gain and finding the optimal factor setting (design). Many statistical models are so complicated, however, that standard measures of information gain cannot be computed analytically.

Our approach uses simulated data from the statistical models of interest to estimate the information gain at a given experimental design. In our project, we were interested in finding the designs that are most informative with respect to selecting the most plausible model among a set of possible candidate models (model discrimination), where each of the candidate models might potentially explain the problem at hand. Our approach is to apply supervised classification methods (in particular classification trees and random forests) to the simulated data and estimate the classification accuracy at each design.

One practical application of our methodology are cell experiments. Our method helps to determine the optimal time of observation for the cell experiment in order to find out which model can best explain the development of the number of bacteria within macrophages. However, we tested our method on many other statistical models where it can be beneficial, for example the standard epidemiological SIR (susceptible-infected-recovered) models or different kinds of spatial extremes models.

Importance of MACH-2

Our suggested approach is very simulation-intensive. The proposed approach greatly reduces the number of simulations required to estimate the expected information gain for a given design compared to previous approaches. However, since we also want to search for the optimal design using an optimization algorithm, we need to estimate the information gain many times during the course of that algorithm. Therefore, the computational demands are still very high. For the optimizer, we wanted to try multiple starting designs, requiring parallel processing capabilities that MACH-2 was able to provide. Since our study serves as a proof of concept, we wanted to investigate various methods, settings, and design dimensions for our examples, amounting to very high computational demands in total. Thanks to MACH-2, we were able to investigate all the settings we were interested in and run our examples in an acceptable amount of time.

References

  1. Hainy, M., Price, D.J., Restif, O. & Drovandi, C. (2022): Optimal Bayesian design for model discrimination via classification. Statistics and Computing 32 (2): 25. https://doi.org/10.1007/s11222-022-10078-2


JKU Scientific Computing Administration