1. Software installation:
Download ProteoCombiner by clicking on the Download button at https://proteocombiner.pasteur.fr.
The following workflow demonstrates how to combine proteomics data using ProteoCombiner.
2.1. Execute the ProteoCombiner tool (Figure 1)
2.2. Specify the directory containing all result files from bottom-up proteomics experiments. In this directory can also have the database used in the search and the original RAW files in one of these formats: mzML 1.1.0, MS2, Mascot Generic Format (MGF), ABSciex®, Agilent®, Waters® and Thermo® RAW.
2.2.1. Bottom-up proteomics
220.127.116.11. For PatternLab for Proteomics and Comet output files, we recommend to use SEPro tool for filtering results, and use the *.sepr file(s) as input of ProteoCombiner.
18.104.22.168. For MaxQuant output files, we recommend using the txt folder, that contains the following required files: proteinGroups.txt, peptides.txt and msms.txt.
22.214.171.124. We recommend using the FASTA and/or XML database formats obtained from Uniprot.
2.3. Specify the directory containing all result files from top-down proteomics experiments. In this directory can also have the database used in the search and the original RAW files in one of these formats: mzML 1.1.0, MS2, Mascot Generic Format (MGF), ABSciex®, Agilent®, Waters® and Thermo® RAW.
2.3.1. Top-Down proteomics
126.96.36.199. For ProsightPD results, we recommend exporting only PSM identifications, which contain all information of each proteoform (*_PSM.txt). This corresponding file must have the following columns: Checked, Confidence, Identifying Node Type, Identifying Node, Search ID, Identifying Node No, PSM Ambiguity, Sequence, Annotated Sequence, Modifications, # Protein Groups, # Proteins, Master Protein Accessions, Protein Accessions, Protein Descriptions, # Missed Cleavages, Charge, Original Precursor Charge, DeltaScore, DeltaCn, Rank, Search Engine Rank, m/z [Da], MH+ [Da], Theo. MH+ [Da], DeltaM [ppm], Deltam/z [Da], Matched Ions, Total Ions, Intensity, Activation Type, MS Order, Isolation Interference [%], Ion Inject Time [ms], RT [min], First Scan, Last Scan, Master Scan(s), Spectrum File, Ions Matched, Annotation, -Log P-Score, -Log E-Score, C Score, Corrected Delta Mass (Da), Corrected Delta Mass (ppm).
188.8.131.52. For pTop output files, we recommend using only the file(s) ending with _filter.csv.
184.108.40.206. For TopPIC output files, we recommend using only the file(s) ending with _prsm.csv.
220.127.116.11. We recommend using the FASTA and/or XML database formats obtained from Uniprot.
2.4. The Parameters tab allows to access various parameters that are not usually required to be changed for combining all data.
2.4.1. Remove Contaminants: This option allows to remove all protein sequences that represent a contaminant.
2.4.2. Remove Reverse Sequence: This option allows to remove all decoy protein sequences.
2.5. To start combining data, click on the OK button in the Combine tab.
PS: Although ProteoCombiner capitalizes on the data arising from different proteomics search engines, we recommend using a single software-tool to analyze all experiments for proteolytic fragment characterization, and another to analyze intact proteins experiments.
3. Exploring the results
Note: At this point we recommend saving results by selecting Save from File menu or by pressing CTRL + S.
3.1.1. All results are pre-filtered according to the following parameters designated on the top of the Results Browser window, as shown in Figure 2:
18.104.22.168. CombScore: Results containing identification scores greater than or equal to this value will be displayed.
22.214.171.124. Peptide Count: Only identified proteins containing at least this value as identified peptide amount (by bottom-up) will be displayed.
126.96.36.199. Spectral Count: Only identified proteins containing at least this value as identified spectra amount will be displayed.
188.8.131.52. Unique peptide: Only identified proteins containing at least this value as identified unique peptide amount will be displayed.
184.108.40.206. Search: Only results from peptides or proteins containing the sequence input to this field will be displayed. The user can further search by ProteinID (protein accession number), protein description or file name.
220.127.116.11. By clicking on Filter button, this operation is accomplished using all of the parameters described above.
18.104.22.168. By clicking on Reset button, all initial values are restored.
3.2. Combined results
All identified proteins that contain a valid sequence* will be displayed on this tab sorted by Score followed by Sequence Coverage. The protein score is represented by the best proteoform score.
*Protein sequence present in the database.
3.2.1. By clicking on an identified protein, all respective identified proteoforms will be displayed* below the protein table sorted by CombScore**.
*If there is no identified proteoform for a respective protein, all possible identified peptides will be displayed instead.
**CombScore is calculated by summing two different scores: i) a score related to the TDP identification software (which is normalized between 0 and 1; and ii) the percentage of the proteoform sequence coverage based on the peptides, obtained in BUP approach, that match to this proteoform. This score also ranges between 0 and 1.
22.214.171.124. The user can assess each identified proteoform by clicking on the Is valid checkbox.
126.96.36.199.1. At this point, we recommend saving results once again so that the personal assessments can be included. This is done by selecting Save from File menu or by pressing CTRL + S.
188.8.131.52. By clicking on an identified proteoform, all respective identified peptides will be displayed below the proteoform table sorted by Peptide Score.
184.108.40.206.1 By double-clicking on an identified peptide, the tandem mass spectrum, which contains the best identification score, will be displayed on Spectrum Viewer. (Figure 3)
3.2.2. By double-clicking on an identified protein, a new window will be opened that shows the Protein Coverage (section 3.5).
3.3. Bottom-up proteomics results
All identified proteins and the corresponding peptides will be displayed on this tab.
3.3.1. By clicking on an identified protein, all respective identified peptides will be displayed below the protein table.
220.127.116.11. By clicking on an identified peptide, all respective identified tandem mass spectra will be displayed below the peptide table.
18.104.22.168.1. By double-clicking on an identified tandem mass spectrum, the Spectrum Viewer will be opened. (Figure 3)
3.3.2. By double-clicking on an identified protein, a new window will be opened that shows the Protein Coverage (section 3.5).
3.4. Top-down proteomics results
All identified proteoforms will be displayed on this tab grouped by Theoretical Mass in Da.
3.4.1. By double-clicking on the Scan Number column of an identified proteoform, the Spectrum Viewer will be opened. (Figure 3). By clicking on any other column, a new window will be opened that shows the Protein Coverage (section 3.5).
3.5. Protein Coverage
3.5.1. Once the window is opened, the information will be displayed on the top tab: Protein description, Monoisotopic and Average protein mass* and Sequence coverage. (Figure 4)
*If Protein coverage window is opened from the click of a specific proteoform, its monoisotopic and average mass will be displayed instead of the protein mass.
3.5.2. The left box will display all approaches used to identify the proteoforms and/or peptides. By clicking on each item, all respective lines will be high-lighted in the right box.
22.214.171.124. Bottom-up and middle-down approaches, represented by the blue color,
126.96.36.199. Top-down approach is represented in three different colors: Expected proteoforms in orange, which means all identified proteoforms by the full theoretical mass; non-expected proteoforms in cyan, which represents all identified truncated proteoforms; and tagged proteoforms in red, which means all proteoforms that were identified by a part of the protein sequence.
188.8.131.52. All identified PTMs will also be displayed in this box.
3.5.3. The right box will display all identified proteoforms and/or peptides that will be represented by different lines. All of them will be displayed sorted by CombScore. On the top will be displayed the full protein sequence and all theoretical modifications (present in the database) as can be seen in Figure 4. All theoretical chains will be shown below protein sequence in gray dash lines. The user is able to check all information about the modification or theoretical chain by passing the mouse over the line or the modified amino acid. (Figure 4)
184.108.40.206. Each proteoform will be displayed according to the classification: expected, non-expected or tagged proteoform. By right-clicking on each line, the user is able to assess the proteoform identification (valid or invalid); in addition, it’s possible to highlight only the peptides that fit into the proteoform. (Figure 5). By hovering over each line, some information will be shown: Proteoform sequence, Score, Search engine that identified this specific sequence, Start and End positions.
220.127.116.11. All identified PTMs will also be displayed and by hovering over each one, it is possible to check their position and description.
18.104.22.168. The identified proteoforms and/or peptides can be displayed in a single line or in multiple lines. The user can change the way to visualize in Utils menu, by selecting Custom Protein Visualization option in Results Browser window (or by pressing ALT + C).
3.6. Loading results
3.6.1. ProteoCombiner loads results in its own format (*.pcmb). This can be accomplished in three ways, the easiest one is by double-clicking on a ProteoCombiner results file. If the Results Browser window is opened, another way to launch the file is by clicking on Load option from the File menu or by pressing CTRL + O, as seen in (Figure 6). Otherwise, if the main window is opened, select Load Results from File menu (or press CTRL + O), as seen in Figure 7.
3.7. Exporting results
3.7.1. ProteoCombiner also allows to export all combined results to Excel® (*.xlsx) or PDF® file. This is done by selecting Excel file from the File menu Þ Export results (or by pressing ALT + E). Or by selecting PDF file from the File menu Þ Export results (or by pressing ALT + P). (Figure 6)