Detailed steps for the three phases are discussed below (please also refer the URL
http://caps.ncbs.res.in/download/protocols_network/steps.html for step-by-step illustrations using examples). All the commands must be executed within the same directory that contains the protein structure of interest, unless specified otherwise.
Initial alignment phase
1| Initial alignment of pairwise structure comparison using MINRMS can be obtained using the steps in option A and initial alignment of multiple structure comparison using STAMP using the steps in option B.
(A) Steps to obtain pairwise alignment
(i) /$path/minrms -HS first.pdb second.pdb
More details can be found at the following URL: "http://caps.ncbs.res.in/download/protocols_network/steps.html#minrms":http://caps.ncbs.res.in/download/protocols_network/steps.html#minrms
Select the best alignment using the following steps.
(ii) MINRMS provides a number of aligned MSF files for the user to select the best alignment. These result files can be imported into CHIMERA31 and the user can visualize and choose the best alignment. Usually the best alignment can be identified as the one with the highest log-P value where the longest distance could be met graphically.
(iii) Using the above logic, a script file “bestalign” was used to retrieve the best alignment file automatically without using the graphical viewer and later it was cross-checked using graphical window. We recommend employing the “bestalign” script to retrieve the best alignment file, if the user wishes to examine and employ several pair wise alignments for subsequent analysis.
(iv) The best alignment retrieved through MINRMS, is considered as an initial alignment and seeded in COMPARER in the next step to derive the best final alignment. However, MINRMS is relatively time-consuming and is best used from a cluster environment.
(B) Steps to obtain multiple alignment (TROUBLESHOOTING)
More details with example can be found at the following URL:
(i) Create an input query file using the command "/$path/" with the extension ".database"
CRITICAL STEP The first structure listed from the file “filename.database” is considered as a representative structure for the multimember alignment, and will be used to screen other structures in order to align closely related structure next to each other. The user should choose the best representative structure, which could be without structural loss at the core region as well as not containing the extra length.
(ii) Run STAMP by "/$path/stamp -l query_file -n 2 -s -slide 5 -prefix query_name -d database_file". where option 'l' is for the input file and 'n' is for number of fits and 's' makes the scan mode on, 'slide' tells the number of residue in query to slide against the database-query sequence, 'prefix' stands for the prefix of the output file.
(iii) Run SORTTRANS by "/pathname/sorttrans -f query_name.scan -s Sc 2.0 >query_name.sorted".
(iv) Run TRANSFORM by "/pathname/transform -f query_name.sorted -g". where "-g" is for graphical output.
(v) Run POSTSTAMP by "/pathname/poststamp -f query_name -min 0.5" to check whether each position in the structural alignment is structurally equivalent across all the members in the alignment and also checks the number of pairwise comparisons with Pij value higher or equal to a cutoff.
(vi) Run STAMP_CLEAN by "/pathname/stamp_clean query_name.post 3> query_name.clean" to cleanup nonsensical gaps in the alignment.
(vii) Run ACONVERT by "/pathname/aconver -in b -out p< query_name.clean >query_name.ali" to get the INITIAL STRUCTURAL ALIGNMENT in ClustalW or MSF format from the STAMP block file format
CRITICAL STEP It is important and critical to seed an initial alignment of good quality in order to perform the final alignment properly.
Final alignment phase
2| Accessory files Accessory files containing information, like solvent accessibility, secondary structural data, and H-bonding patterns, can be obtained in separate files using the “/pathname/joy filename.ali” command from the JOY-5 package.
3| Initial equivalences
(A) JOY-4v. If the user employs JOY-4v package, the command “/pathname/joy -m filename.ali” may be used to obtain the initial equivalences.. The automatically generated result file “filename.mnt” should be renamed to “mnf1.inp”.
(B) SSTEQ. If the user wishes to employ our in house SSTEQ script, the command should be executed just outside the directory as “perl SSTEQ.pl directory-name”. The result file “mnf1.inp” will be created automatically inside the directory, which is convenient for subsequent steps.
4| COMPARER (TROUBLESHOOTING). The above obtained initial equivalences will be fed as a steering input file to the comparer package with the following steps. More details with example can be found at the following URL:
(A) First Stage (TROUBLESHOOTING). Before running the following steps, the user needs to create two types of files, one file having the list of input structures, which should be named as “codes.nam”, the second file having the relationship between the input structures and should be named as “codes.tre”.
(i) PREMNF. Run PREMNF to perform pairwise least-square superimposition by the command "/pathname/pmnfc mnf1.inp " with the steering input file mnf2.inp, which can be copied from the example directory where the COMPARER package has been installed.
(ii) MNFC (TROUBLESHOOTING). Run the command "/pathname/mnfc" with the steering input file, mnfc.inp, which was created as an output file in the previous step.
(iii) HPB2. Run the command "/pathname/hpb2" to obtain hydrophobic contacts as an output “filename.hpc”.
(iv) HBOND. Run “/pathname/hbond filename.pdb” to obtain side chain hydrogen bond with the output file name “filename.shb”.
(B) Second Stage (Simulated annealing)
CRITICAL STEP Simulated annealing is the critical step in order to obtain the best multiple structural alignments. The performance of this step could be realised when the sequence identities are highly diverged.
(i) PANN9. Run the command “/pathname/pann9” with the input parameter file pann9.inp, which could be copied from the example directory of COMPARER package. This program produces a file for each protein which defines all the relationships of the selected type in this protein (what kind of relationships?)
(ii) SPLITTER. Run the command “/pathname/splitter” to produce a separate relationship tables from ‘mixed relationship’ files optionally.
(iv) PREANN. Run the command “/pathname/preann” to construct the steering data file for the ANN9 program.
(v) ANN9. Run the command “/pathname/ann9” to produce several pairwise alignments.
(vi) POSTANN. Run the command “pathname/postann” to transform filename.ann files into AM13 format.
(C) Third stage (final alignment)
(i) PRDGP. Run the command "/pathname/prdgp" to obtain gap penalties.
(ii) AM13. Run the command "/pathname/am13" to get the final alignment in the COMPARER format, which uses various parameter files and output files from the previous steps
(iv) ALNPAP. Run the command "/pathname/alnpap" to get the alignment in PIR format
5| Final equivalences. The equivalences from the final alignment must be calculated using the same Step 3 procedures.
6| Superposed coordinates. This could be obtained through JOY-3.2v using the command “/pathname/mnyfit –f” with the steering input file obtained in the previous step.
Alignment Assessment Phase
Alignments derived using purely sequence or structure-based properties can be compared for structural deviations after rigid-body superposition and secondary structural equivalence at the level of superfamily relationships.
7| Mean RMSD. The mean root-mean-square-deviation (RMSD) values can be measured one-against-all within a group of structures which was compared. From the analysis of carefully curated alignments of a previous version of the database13 we observe that, despite distant relationships, this value is generally less than 5.5 Å. Therefore, any superfamily member in the derived alignment that shares more than 5.5 Å mean RMSD. is best removed and treated as an outlier.
8| Percentage secondary structure equivalences
The concept of superfamily level relationships implies high structural similarity and secondary structural equivalences15. Therefore, the number of alignment positions that retain majority equivalent secondary structures (in more than 75% of members) normalised over the mean number of non-gap positions over the entire alignment for all the superfamily members can be calculated. From the analysis of carefully curated alignments of a previous version of the database13, we found that this normalized factor of secondary structural equivalence is ? 30%. This threshold can be adopted to recognise superfamily alignments that are significantly poorly aligned, if the value drops less than the threshold.