Hiromi W.L. Koh1, Hannah L.F. Swa2, Damian Fermin3, Siok Ghee Ler2, Jayantha Gunaratne2,4, Hyungwon Choi1,2
1 Saw Swee Hock School of Public Health, National University of Singapore
and National University Health System
2 Institute of Molecular and Cell Biology, A*STAR, Singapore
3 Department of Pathology, Yale University, New Haven, CT 06511, USA
4 Yong Loo Lin School of Medicine, National University of Singapore, Singapore
Published online in Proteomics on 24 April 2015.
Labeling-based proteomics is a powerful method for detection of differentially expressed proteins (DEPs). The current data analysis platform typically relies on protein-level ratios, which is obtained by summarizing peptide-level ratios for each protein. In shotgun proteomics, however, some proteins are quantified with more peptides than others, and this reproducibility information is not incorporated into the differential expression (DE) analysis. Here, we propose a novel probabilistic framework EBprot that directly models the peptide-protein hierarchy and rewards the proteins with reproducible evidence of DE over multiple peptides. To evaluate its performance with known DE states, we conducted a simulation study to show that the peptide-level analysis of EBprot provides better receiver operating characteristic and more accurate estimation of the false discovery rates than the methods based on protein-level ratios. We also demonstrate superior classification performance of peptide-level EBprot analysis in a spike-in dataset. To illustrate the wide applicability of EBprot in different experimental designs, we applied EBprot to a dataset for lung cancer subtype analysis with biological replicates and another dataset for time course phosphoproteome analysis of EGF-stimulated HeLa cells with multiplexed labeling. Through these examples, we show that the peptide-level analysis of EBprot is a robust alternative to the existing statistical methods for the DE analysis of labeling-based quantitative datasets.
Availability: The software suite is freely available on the Sourceforge website
Figure legend: (A) The scoring framework in EBprot. The peptide ratio data are modelled as repeated measures of protein level relative quantitation between different samples and the DEPs are selected based the probability derived from the model. (B) The scoring framework can be utilized for the analysis of multiple comparisons in different ways depending on the experimental design. Note that a “sample” in the figure refers to two or more biological samples or conditions that are differentially labeled.
For more information on Hyungwon CHOI, please click here and Jayantha GUNARATNE's laboratory, please click here.