Evolutionary non-linear modelling for selecting vaccines against antigenically-variable viruses
Motivation: In vitro and in vivo selection of vaccines is time consuming, expensive and the selected vaccines may not be able to provide protection against broad-spectrum viruses because of emerging antigenically novel disease strains. A powerful computational model that incorporates these protein/DNA or RNA level fluctuations can effectively predict antigenically variant strains, can minimise the amount of resources spent on exclusive serological testing of vaccines and make wide spectrum vaccines possible for many diseases. However, in silico vaccine prediction remains a grand challenge. To address the challenge, we investigate the use of linear and non-linear regression models to predict the antigenic similarity in foot-and-mouth disease virus strains and in influenza strains, where the structure and parameters of the non-linear model are optimised using an evolutionary algorithm. In addition, we examine two different scoring methods for weighting the type of amino acid substitutions in the linear and non-linear models. We also test our models with some unseen data.Results: We achieved the best prediction results on three data sets of SAT2 (Foot-and-Mouth disease), two data sets of serotype A (Foot-and-Mouth disease) and two data sets of influenza when the scoring method based on biochemical properties of amino acids is employed in combination with a non-linear regression model. Models based on substitutions in the antigenic areas performed better than those that took the entire exposed viral capsid proteins. A majority of the non-linear regression models optimised with the evolutionary algorithm performed better than the linear and non-linear models whose parameters are estimated using the least squares method. In addition, for the best models, optimised non-linear regression models consist of more terms than their linear counterparts, implying a non- linear nature of influences of amino acid substitutions. Our models were also tested on five recently generated FMDV datasets and the best model was able to achieve an 80% agreement rate.