Magnum Configuration

Configuring Magnum

Due to the large number of parameters used in any Magnum analysis, all parameters are set in a configuration file rather than typed out on the command line. A sample configuration file is provided with Magnum, and can also be copied from this example.

Changing parameters should be done from any ASCII text editor, such as vim in Linux, or notepad in Windows. For ease of reading, parameter names use english words instead of single letters. Notes are defined as any text that appears after the # symbol on each line. Notes are not processed as parameters and are there to provide helpful information to users. Feel free to add additional notes as needed for your analysis.

Helpful Hint

Make a different configuration file for each analysis you wish to perform. Simply copy an existing configuration file, then give it a unique name and adjust the parameters. Thus, you preserve the unique parameter settings for each of your analyses.

For clarity, the parameters have been divided into six groups by similarity. This page will describe the most important parameters within those groups. A complete list of all parameters and their descriptions can be found on the Parameters page.

1. Data I/O

The Data I/O parameters describe files that will be imported and exported from Magnum during analysis. There are two key files for input: your MS data file and a protein sequence database:

database = polymerase_proteins.fasta
MS_data_file = my_data.mzML

There are two points worth mentioning in the above example. First, the protein sequence database is an ASCII text file of the FASTA format. It should contain both target proteins (sequences you hope to find) and decoy proteins (sequences you know are false). Second, the mass spectrometry data file must be in an approved format. On all systems, those formats are mzML (recommended) and mzXML. Additionally, Thermo RAW files may be used on Windows systems if the appropriate drivers have been installed.

Helpful Hint

FASTA sequence database files should contain both target and decoy sequences if you wish to use Percolator for results validation. Below is an example of what a target protein sequence and its matching decoy sequence might look like:

>Protein_1
MTHISISMYPRTEINSEKWENCE
>Decoy_1_reverse
ECNEWKESNIETRPYMSISIHTM

Helpful Hint

Magnum looks for files in the current working directory, i.e. the directory you are in. You can provide relative paths to parent- and sub-directories if desired. To ensure input and output is directed from and to a specific location no matter which directory you are in, provide an absolute path with your files:

database = C:\databases\e_coli.fasta

The remaining parameters in this group define exported content after Magnum analysis is complete. All output is in tab-delimited ASCII text files. The standard output file contains a complete summary of the results in a single file:

output_file = my_results.txt

The Percolator parameters return the same results, but formatted such that they can be directly imported to Percolator for validation of the peptide spectrum matches (PSMs):

percolator_file = my_results_perc_format
percolator_version = 2.07

For the percolator_file parameter, no extension is necessary. The percolator_version number should match the version of Percolator you will be using, as input formats differ between Percolator versions. Magnum will automatically format the results to match the version you specify.

2. Data Descriptors

The Data Descriptor parameters describe the spectra to be analyzed. Accurate description of the spectra ensures the correct spectrum processing is performed during the Magnum analysis.

During Magnum analysis, all profile MS peaks must be converted to centroid MS peaks. Magnum has built-in functionality to perform this operation, but requires additional information about the instrument used and the resolution settings. Correctly setting these values will produce the most accurate centroid peak conversion. First, identify whether or not the spectra are centroided at the MS and MS/MS levels:

MS1_centroid = 0
MS2_centroid = 1

In the above example, 0 indicates profile peaks and 1 indicates centroid peaks. The parameters thus tell Magnum that the MS spectra are profile, and the MS/MS spectra are centroid. As a result, only the MS peaks will be converted to centroid values during the analysis. However, Magnum will need additional information about the peaks, such as what instrument they were acquired on, and at what resolution:

instrument = 0
MS1_resolution = 55000

Here, the 0 for instrument indicates an orbitrap mass analyzer. The resolution is approximated at 400 m/z, which may differ from m/z value the manufacturer reports. Make sure you report the resolution at 400 m/z. MS2_resolution is not used in this example, because we set MS2_centroid = 1, and thus any value for that parameter is ignored during analysis.

Helpful Hint

Don't want to worry about setting the correct parameters to convert profile peaks to centroid? Simply give Magnum a data file where all the peaks are already centroided. With the right tool, such as msconvert from ProteoWizard, you can do this when converting your raw data to an mzML file.

The enrichment parameter is always set to 0, unless you performed peptide c-terminus labeling during the enzymatic digestion step of your sample prep. Details of how and when to use this parameter are found on the Parameters page.

3. Open Modifications and Adducts

The characteristic features of Magnum are the ability to detect unspecified modification masses on peptides analyzed by mass spectrometry, and possibly localize them on the peptide backbone. When combined with some a priori knowledge, these features become powerful tools for detecting novel chemical moieties and adducts resulting from treatments to your protein samples.

If your adduct masses are known, they are best identified through standard modification analyses. However, if the precise adduct mass is unknown, a range of potential adduct masses are specified with two simple parameters:

min_adduct_mass = 50.0
max_adduct_mass = 300.0

In the example above, any open modification mass between 50 and 300 Da will be searched on all peptide sequences. If a peptide with an open modification in this range scores higher than any other peptide sequence (including those without modifications or more typical variable modifications), then this open-modified PSM will be reported in the Magnum results.

A unique feature to Magnum is the ability to localize the open modifcation or adduct to a particular peptide residue or set of residues, including the protein termini. Localization has several advantages, foremost in restricting the open search space to a realistic subset of peptides. This feature is especially useful if your adduct has reactive sites that limit its interactions to particular chemical moieties. Localization may also help in downstream curation of the results. To set localization, use the following parameter:

adduct_sites = nK

The example above limits the open modification search space to primary amines (lysines and the protein n-terminus). To have a complete open modification search, it is possible to specify all amino acids. Localization will still occur if there is fragmentation evidence to place the adduct on a particular residue.

One final note on addut localization. Explicit localization to a specific residue is only made if there is fragmentation ion evidence for the localization. If there is no fragmentation evidence (i.e. the adduct mass simply makes it possible to observe the fragment ions without the adduct - common among labile adducts), then the adduct mass is still reported to the user, but without an localization indicators, even if there is only a single possible site for the adduct to bind.

4. Modifications

The Modifications parameters refer to amino acid modifications that are independent of the open modification/adduct searches. These occur naturally (e.g. post-translational modifications) or as a result of sample preparation (e.g. carbamidomethylcysteine):

fixed_modification = C 57.02146
modification = M 15.9949
max_mods_per_peptide = 2

The fixed_modification and modification parameters each require two values: a target and a mass. The fixed_modification parameter specifies a static mass that is added to the normal mass of the indicated peptide. The modification parameter specifies a mass that may or may not be on the peptide. Because each modification exponentially increases search time, you can limit the number of differential modifications to a practical number. You can also specify multiple fixed_modification and modification parameter lines to indication multiple different mass modifications you expect to find in your analysis.

Helpful Hint

You can specify multiple differential modifications for the same amino acid. Place each modification on a separate line, but indicate the same amino acid with a different mass.

5. Scoring Algorithm

The Scoring Algorithm parameters describe the tolerances applied to the scoring algorithm for analyzing MS/MS spectra of different resolutions and mass accuracy. High-resolution MS/MS have a really small mass tolerance and no mass offset. Below is an example of values appropriate for high-resolution (e.g. orbitrap) MS/MS spectra:

fragment_bin_size = 0.03
fragment_bin_offset = 0.0

Low-resolution (e.g. ion trap) MS/MS spectra need a much wider mass tolerance, and an offset to make sure the peak values are most likely to fall within the center of the mass tolerance. For low-resolution MS/MS spectra, the following parameter values are recommended:

fragment_bin_size = 1.0005
fragment_bin_offset = 0.4

6. Analysis

The Analysis parameters are the largest set of parameters, but are easy to understand and use. Not all parameters are listed in these examples. You can find the complete list on the Parameters page.

enzyme = [KR]|{P}
max_miscleavages = 2

The above two parameters indicate the trypsin enzyme cleavage rule and the number of missed cleavages allowed.

min_peptide_mass = 500.0
max_peptide_mass = 6000.0
min_peptide_length = 7
max_peptide_length = 40

The peptide mass boundaries shown above are important to understand, but once set, you probably will use the same values for all Magnum analyses. These parameters determine the smallest and largest peptides to be searched, by mass in Daltons, not by number of amino acids. Additionally, residue counts are used to limit peptides to those of reasonable lengths regardless of mass. For example with the settings above, if a peptide has mass greater than 500 Da, but is only six residues in length, it will not be searched.

ppm_tolerance_pre = 25.0
prefer_precursor_pred = 1

These two parameters help define the search space for each spectrum. ppm_tolerance_pre defines the tolerance around the precursor mass, in parts per million, that a PSM is allowed to have to be accepted. Setting a value similar to your instrument mass accuracy is recommended, but there are various reasons to use much larger or smaller values that are outside the scope of this manual. Sometimes, such as with Thermo RAW data, a suggested monoisotopic precursor mass is provided for a spectrum. It is often helpful to use this value, which is indicated by setting prefer_precursor_pred to 1. If you set the value to 0, then Magnum will compute new monoisotopic precursor masses for the spectra. Regardless of this parameter setting, if no precursor monoisotopic masses were previously assigned to the spectra, Magnum will compute them.

spectrum_processing = 1
max_spectrum_peaks = 0

The two parameters above determine if and how MS/MS spectra are processed. spectrum_processing can be turned off by setting the value to 0. As spectrum_processing is only intended for high-resolution MS/MS, it should be turned off for low-resolution MS/MS data. Processing the spectra involves collapsing isotope distributions to the monoisotopic peak to facilitate better scoring algorithm performance. Additionally, you can remove presumed noise peaks by limiting the number of peaks to analyze after collapsing using the max_spectrum_peaks parameter. For example, setting this parameter to 200 will limit the analysis to only the tallest 200 peaks in the spectrum. Setting the value to 0 will cause analysis using all peaks in the spectrum.

decoy_filter = DECOY

This last parameter is critical for downstream validation using Percolator. It defines a case-sensitive word that is found only in the labels of your decoy database sequences. The decoy word can be any that you prefer, but make sure it does not appear in the labels of your target sequences, or they will incorrectly be assigned as decoys.