Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01ns0646068
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorFloudas, Christodoulos Aen_US
dc.contributor.authorBaliban, Richard Christopheren_US
dc.contributor.otherChemical and Biological Engineering Departmenten_US
dc.date.accessioned2012-11-15T23:57:35Z-
dc.date.available2012-11-15T23:57:35Z-
dc.date.issued2012en_US
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01ns0646068-
dc.description.abstractThe field of proteomics seeks to address a grand problem in biology where large-scale determination of the gene and cellular function of an organism is directly analyzed at the protein level. Over the last decade, liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has emerged as a prominent tool within the field due to the capacity for high-throughput and high-sensitivity experimental designs. The resulting output from LC-MS/MS systems often include thousands of MS/MS spectra, each of which is a complex piece of data that must be analyzed to extract relevant information about the proteins contained in a cellular sample. These data sets are often noisy and therefore require sophisticated and robust tools that are capable of efficiently processing the information. This thesis presents several mathematical models and algorithms that address three major areas of open research problems in proteomics: (1) post-translational modification (PTM) identification at the peptide level, (2) unmodified and modified protein identification, and (3) determination of optimal biomarker combinations. When conducting a LC-MS/MS experiment, the prime objective is the identification of a complete list of samples proteins along with all identified PTMs. This is a major challenge due to the vast increase in computational complexity obtained from introduction of over 900 modifications to a typical 20 amino acid universe. Two novel algorithms were developed based on integer linear optimization for (1) the identification of a comprehensive list of all proteins and (2) the untargeted identification of all modifications along a template peptide sequence. Existing peptide identification algorithms are utilized to initially determine all unmodified peptides which are input to the protein identification algorithm to determine the list of all sample proteins. An untargeted search for all modified amino acid sites within the protein list is then performed using a universal set of all PTMs. Demonstration of these algorithms results in superior accuracy on both small and large-scale data sets when benchmarked against existing state-of-the art methods. The complete suite of algorithms was fully integrated into a webtool that was made freely available to the scientific community. Using the above algorithms, gingival crevicular fluid (GCF) samples were analyzed to identify novel biomarker combinations of proteins that could effectively diagnose individuals that are either periodontally healthy (PH) or afflicted with chronic periodontitis (CP). A training set of 12 PH and 12 CP samples identified 432 human and 30 bacterial proteins, 150 of which were not previously identified in large-scale proteomics analysis. GCF samples were obtained from 72 additional subjects, and a mixed-integer optimization model was developed to identify the optimal combination of biomarkers for diagnosis of PH or CP individuals. A thorough cross-validation of the model capability was performed on a training set of 55 samples, and greater than 99% accuracy was consistently achieved. The model was then tested on two blind test sets, and using an optimal combination of 7 human proteins and 3 bacterial proteins, the model was able to correctly predict 40 out of 41 PH and CP samples.en_US
dc.language.isoenen_US
dc.publisherPrinceton, NJ : Princeton Universityen_US
dc.relation.isformatofThe Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the <a href=http://catalog.princeton.edu> library's main catalog </a>en_US
dc.subjectBiomarkersen_US
dc.subjectGingival Crevicular Fluiden_US
dc.subjectInteger Linear Optimizationen_US
dc.subjectPost-translational Modificationsen_US
dc.subjectProtein Identificationen_US
dc.subjectTandem Mass Spectrometryen_US
dc.subject.classificationChemical engineeringen_US
dc.subject.classificationMolecular biologyen_US
dc.subject.classificationBioinformaticsen_US
dc.titleHigh-throughput methods for in silico discovery of peptides, proteins, and post-translational modifications in proteomicsen_US
dc.typeAcademic dissertations (Ph.D.)en_US
pu.projectgrantnumber690-2143en_US
Appears in Collections:Chemical and Biological Engineering

Files in This Item:
File Description SizeFormat 
Baliban_princeton_0181D_10425.pdf1.26 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.