A Quantitative Summary Statistic for Genetic Admixture

Sultana, Mayisha Mahdiya

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp011g05ff65d

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Storey, John D.	-
dc.contributor.author	Sultana, Mayisha Mahdiya	-
dc.date.accessioned	2020-10-02T19:30:26Z	-
dc.date.available	2020-10-02T19:30:26Z	-
dc.date.created	2020-05-04	-
dc.date.issued	2021-10-02	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp011g05ff65d	-
dc.description.abstract	The admixture model is a widely popular approach to evaluate the genetic ancestry of humans and other organisms. The model has successfully been used to improve the accuracy of genetic association studies, to further the understanding of human migratory history, and to help identify signatures of natural selection. Admixture occurs when individuals of two genetically divergent populations interbreed. The admixture model, assuming that each observed individual is derived from $d$ ancestral populations, estimates (a) the allele frequencies that define the ancestral populations, and (b) the proportions of each individual's genetic information that comes from each ancestral population. The standard summary tool for the results of ancestry estimation has become the admixture barplot, a stacked barplot illustrating admixture proportions across individuals. In the genetic literature, these barplots are used to compare the ancestry profiles of distinct populations and even to inform the reconstruction of ancestral histories. Unfortunately, such usage can be extremely misleading, because there is no concrete metric of similarity when using a qualitative summary. It is difficult to know the error associated with ancestry estimates, and two similar-looking barplots may come from datasets that represent individuals with very different ancestry. Therefore, we need a tool that can summarize subtle differences in the underlying distribution of admixture. This thesis calls attention to the need for a quantitative summary statistic for admixture that is concise, informative about the error in the estimates, and allows a comparison of ancestry across datasets. Here, we evaluate the two most common methods of obtaining summary statistics in the field of statistics: maximum likelihood estimation and method of moments. However, these methods fail to achieve a high level of accuracy. To solve this problem, we propose a new summary method, the Hybrid estimator, and demonstrate that it outperforms the existing methods in accuracy. Rather than replace the existing tool, the goal of this thesis is to encourage the use of this summary alongside the admixture bar plot. This will provide a more robust analysis of ancestry.	-
dc.format.mimetype	application/pdf	-
dc.language.iso	en	-
dc.title	A Quantitative Summary Statistic for Genetic Admixture	-
dc.type	Princeton University Senior Theses	-
pu.date.classyear	2021	-
pu.department	Molecular Biology	-
pu.pdf.coverpage	SeniorThesisCoverPage	-
pu.contributor.authorid	920089241	-
pu.certificate	Center for Statistics and Machine Learning	-
pu.certificate	Global Health and Health Policy Program	-
Appears in Collections:	Global Health and Health Policy Program, 2017 Molecular Biology, 1954-2020

Files in This Item:

File	Description	Size	Format
SULTANA-MAYISHAMAHDIYA-THESIS.pdf		4.01 MB	Adobe PDF	Request a copy

Show simple item record

Search

Browse