Applying the algorithm to a set of input structures
If not already installed, download and install the Perl programming language (http://www.perl.org).
Prepare a set of input structures in Brookhaven PDB file format. Structures may be downloaded from the Protein Data Bank (http://www.rcsb.org). X-ray structures are preferred over NMR structures due to increased accuracy of sidechain positions. If starting with a library of sequences, generate a set of 3D homology-models using comparative-modeling software, such as SWISS-MODEL (http://swissmodel.expasy.org/SWISS-MODEL.html, Modeller (http://www.salilab.org/modeller/), etc. NOTE: The performance of the algorithm is partly dependent on the resolution of the input structures, and better results will be obtained using high quality structural data or homology models. Homology models should be constructed with high quality templates (e.g. X-ray structures) and checked for errors using software such as WHATCHECK [4].
Run AFPredictor on the input structures. Unless specified otherwise, the program will use default values for the parameters. The usage for AFPredictor is shown in Fig. 1. If the option –d is specified, the # OSCs, FASA and TASA scores (see [2] for a description) will be printed as the program runs and output in the file (output.txt). If the option –f is specified, additional information will be output including the residues and atoms corresponding to predicted OSCs as well as their solvent-accessibility as calculated by Vsurface [3]. Option –o will highlight predicted OSCs in an output PDB file. Atoms are highlighted in the B-factor column of the PDB file and can be viewed using molecular visualization software. For example, in Deep View (Swiss-PDB viewer) (http://www.expasy.org/spdbv/), the AFP output PDB file can be loaded, a molecular surface calculated, and the molecular surface layer colored by B-factor. Using the default Swiss-PDB color scheme, OSCs will appear red and all other atoms will appear blue (Fig. 2). Command-line use of AFPredictor is shown below:
perl AFPredictor.pl [-Options] {-f <PDB file> or –d <directory>}
Applying the algorithm to a background dataset (optional)
At this point, the user might wish to rank the scored test structures from step 3 against a background dataset such as a non-redundant set of PDB files. If default parameters have been used, the user may compare with the previous results compiled in the file 2006results.txt. In this case, steps 4-7 may be omitted. However, if any parameters used in step 3 have been changed from the default values or the user wishes to use a different data set for comparison, proceed to step 4.
Create a list of non-redundant protein sequences for comparison. This may be done using the PISCES server [5] accessible at http://dunbrack.fccc.edu/PISCES.php. It is important that redundant sequences/structures be removed from the background dataset because multiple instances of the same structure may statistically skew the final results. Ensure that incomplete (e.g. alpha-carbon only) PDB files are removed in the culling procedure.
Retrieve the corresponding structures from the PDB and place all files in the same directory (e.g. /nonredundantPDB/). All files should be in PDB format and have the extension .pdb. A PDB web download utility is available at http://www.rcsb.org/pdb/static.do?p=download/web_download/download.jsp.
NOTE: If the PDB was culled ‘by chain’ instead of ‘by entry’ in step 4, retrieve only the specified chain for each PDB file. If this approach is used, the algorithm may analyze surfaces that are internal in the native structure. If this is unwanted, cull the PDB ‘by entry’ (in step 4), obtain and use a monomeric structure database, or process PDB files through the PQS server (http://pqs.ebi.ac.uk/).
(Optional) Remove hydrogen atoms from all PDB files. Doing so will improve the overall runtime of the algorithm, especially if the structural database is large (>500 structures).
Run the algorithm on the non-redundant PDB directory (specify the parameter –d). An example command (using default parameters) is shown below.
perl AFPredictor.pl -d /nonredundantPDB/
- Sort the results from step 3 and step 7 by FASA or TASA. This results in a final FASA or TASA ranking. A percentile score for each structure can be calculated as: 1- ranking / # structures. This should not be interpreted as the strength of ice-binding activity or likelihood that a structure is an ice-binding protein, but as a relative measure of the surface area occupied by predicted OSCs.