IDENTIFICATION OF THE GROWTH RATE OF VIRUS PROPAGATION

 January 3, 2018    R, PYTHON, MACHINE LEARNING, DATA ANALYSIS, MATLAB.

The primary objective of this project is to detect a virus spreading speed by evaluating its propagation modes (constant, logistic, and exponential modes) with machine learning techniques, in order to provide public health agencies with essential and predictive information for making data-driven healthcare decisions.

This modeling-oriented research was carried out in following steps, 1) prepare the datasets and analyze the data, 2) classify three propagation modes with machine learning methods, and 3) validate the effectiveness of our models.

Our simulated data was based on the Coalescent Theory, a model of the distribution of gene divergence in a genealogy. The data were formed as multiple pairwise matrices with the shape of 50*50 (suppose we got 50 gene samples of the virus). Each value in the matrix represented the gene divergence between the row number virus and the column number virus. Each matrix displayed a propagation mode in one of the three modes mentioned above.

We reordered each matrix by its divergence values and converted them into heat-maps. It was clear that there was big difference between three propagation modes. See the comparison below.

Then, machine learning models, such as SVM, NN, logistic regression, etc. were trained for doing classification jobs. Recall rate and Accuracy were evaluated.