Age Prediction by DNA Methylation in Neural Networks

Lechuan Li, Rice University
Chonghao Zhang, Trinity University
Shiyu Liu, Trinity University
Hannah Guan, University of Minnesota Twin Cities
Yu Zhang, Trinity University

Abstract

Aging is traditionally thought to be caused by complex and interacting factors such as DNA methylation. The traditional formula of DNA methylation aging is based on linear models and little work has explored the effectiveness of neural networks, which can learn non-linear relationships. DNA methylation data typically consists of hundreds of thousands of feature space and a much less number of biological samples. This leads to overfitting and a poor generalization of neural networks. We propose Correlation Pre-Filtered Neural Network (CPFNN) that uses Spearman Correlation to pre-filter the input features before feeding them into neural networks. We compare CPFNN with the statistical regressions (i.e. Horvaths and Hannums formulas), the neural networks with LASSO regularization and elastic net regularization, and the Dropout Neural Networks. CPFNN outperforms these models by at least 1 year in term of Mean Absolute Error (MAE), with a MAE of 2.7 years. We also test for association between the epigenetic age with Schizophrenia and Down Syndrome (p=0.024 and p<0.001, respectively). We discover that for a large number of candidate features, such as genome-wide DNA methylation data, a key factor in improving prediction accuracy is to appropriately weight features that are highly correlated with the outcome of interest.