Validating PSMs

Validating Magnum PSMs

The Magnum results are a list of the best peptide-spectrum match (PSM) for each spectrum analyzed. Each PSM is simply the best theoretical sequence match to the observed spectrum peaks, and not necessarily the correct sequence match. There are many reasons why the correct PSM may not have been found (e.g. the sequence was not in your database, the peptide contains an unknown PTM, poor spectrum quality, etc.), and thus the observed PSMs must be statistically validated.

Magnum results come pre-formatted for use with Percolator, a semi-supervised machine learning algorithm for validating PSMs at a user-defined false discovery rate. Percolator is developed and maintained by the Käll Lab at KTH, and can be downloaded for free here. Extensive discussion of Percolator and its features is beyond the scope of these instructions; however, basic instructions are provided below.

Advanced Users

There are many alternatives to Percolator for validating PSMs. If you prefer an alternative, all the parameters and scoring metrics from the Magnum algorithm are provided in the basic Magnum results file. These results should be parsed and formatted for your algorithm of choice.

Running Percolator

Percolator is run from a command line. The command line syntax and available options change with each version. These instructions describe using Percolator version 2.08. Descriptions of command line options are found when executing Percolator with the -h option.

Run Percolator from the directory containing your Magnum results. To validate Magnum results, provide the Percolator-formatted file as input, and redirect output to a new file:

$ /usr/bin/percolator data_set1.perc.txt > data_set1.validated.txt

In the above example, PSMs from Magnum are analyzed with Percolator to obtain a non-redundant list of unique peptides. All these unique peptide have q-values that are used to filter for a given false discovery rate.

Interpreting Percolator Results

Percolator results are tab-delimited and can be viewed with spreadsheet software. Detailed instructions are found in the Percolator Results section.

Additional Information

Percolator trains its parameters on a subset of the input PSMs. If the number of input PSMs is low (e.g. less than 100), then the train and test procedure may fail. An error message will be reported to the user. If the train and test procedure passes despite a low number of input PSMs, the results may not accurately model the true error distribution. A warning may be issued by Percolator. It may be necessary to use alternative forms of validation in the case of low numbers of PSMs in the input.

Percolator requires that the input PSMs contain both target and decoy labels. Thus, it is essential that Magnum was used with decoy protein sequences in the database. Additionally, the decoy_filter parameter must be correctly set to a tag that is contained in the descriptions of all decoy protein sequences. Additional details can be found in the Configuration instructions. Failure to operate Magnum in this manner will result in failure of Percolator to assign q-values to each PSM.