Age Prediction by DNA Methylation in Neural Networks

Document Type


Publication Date



Aging is traditionally thought to be caused by complex and interacting factors such as DNA methylation. The traditional formula of DNA methylation aging is based on linear models and little work has explored the effectiveness of neural networks, which can learn non-linear relationships. DNA methylation data typically consists of hundreds of thousands of feature space and a much less number of biological samples. This leads to overfitting and a poor generalization of neural networks. We propose Correlation Pre-Filtered Neural Network (CPFNN) that uses Spearman Correlation to pre-filter the input features before feeding them into neural networks. We compare CPFNN with the statistical regressions (i.e. Horvaths and Hannums formulas), the neural networks with LASSO regularization and elastic net regularization, and the Dropout Neural Networks. CPFNN outperforms these models by at least 1 year in term of Mean Absolute Error (MAE), with a MAE of 2.7 years. We also test for association between the epigenetic age with Schizophrenia and Down Syndrome (p=0.024 and p<0.001, respectively). We discover that for a large number of candidate features, such as genome-wide DNA methylation data, a key factor in improving prediction accuracy is to appropriately weight features that are highly correlated with the outcome of interest.


PMID: 34048347





Publication Information

IEEE/ACM Transactions on Computational Biology and Bioinformatics