Submitted on March 11, 2008
Revised on June 25, 2008
Accepted on August 13, 2008
Spectral dictionaries: Integrating de novo peptide sequencing with database search of tandem mass spectra
Sangtae Kim, Nitin Gupta, Nuno Bandeira, and Pavel A. Pevzner
Bioinformatics, University of California San Diego, La Jolla, CA 92037
Corresponding Author: ppevzner{at}ucsd.edu
Database search tools identify peptides by matching tandem mass spectra against a protein database. We study an alternative approach when all plausible de novo interpretations of a spectrum (spectral dictionary) are generated and then quickly matched against the database. We present a new MS-Dictionary algorithm for efficiently generating spectral dictionaries and demonstrate that MS-Dictionary can identify spectra that are missed in the database search. We argue that MS-Dictionary enables proteogenomic searches in six-frame translation of genomic sequences that may be prohibitively time-consuming for existing database search approaches. We show that such searches allow one to correct sequencing errors and find programmed frameshifts.