Date of Award

5-2017

Document Type

Thesis campus only

Department

Computer Science

First Advisor

Matthew A. Hibbs

Abstract

Accumulation of somatic mutations may contribute to the development of cancers and the functional decline associated with aging. However, the rate and extent of somatic mutation accumulation in otherwise healthy cells is poorly quantified at present, as estimates range from 10 to 105 mutations per cell. Somatic mutation rates for any complex organism likely vary between heterogeneous tissues and over the course of an individual’s lifespan. As such, we have collected extensive time series DNA-seq data from diverse tissues for the well-defined B6/J strain of Mus musculus. Existing approaches for somatic mutation detection are largely designed for oncogenomics, and are not entirely appropriate for whole-genome aging research. To remedy this, we have set out to create an algorithm for accurately determining the incidence rate of somatic mutations in complex DNA-seq data. Through its use of a sophisticated deep neural network machine learning model, our approach detects rare sequence variations, while accounting for the systematic noise intrinsic to high-throughput sequencing technologies. With this neural network we hope to determine if the observed somatic mutation rates are stochastic, or driven by selective pressures, as this may explain how they accumulate differentially across subspecies.

Share

COinS