Metal ions are bound to biological macromolecules via coordination bonds. The bonds are formed by the so-called donor atoms. Such atoms can belong to either the backbone or side chains/bases of the macromolecule (protein or nucleic acid) as well as to non-macromolecular ligands, such as oligopeptides, small organic molecules, anions, water molecules. A metal ion together with its donor atoms and ligands constitute the metal-binding site. However, the biochemical properties of such a site depend also on the surrounding macromolecular environment (5-9). Consequently, we defined the “minimal functional site” (MFS) in a metal-macromolecule adduct as the ensemble of atoms containing the metal ion or cofactor, all its ligands and any other atom belonging to a chemical species within 5 Å from a ligand (1,10). The MFS describes the local 3D environment around the cofactor, independently of the larger context of the protein fold in which it is embedded. The MetalPDB database is an updated collection of all structurally characterized MFSs (11). Recently, we have developed a computational approach, implemented in the MetalS2 program, to quantify the structural similarity of MFSs in metalloproteins (1). In this work we exploited MetalS2 to perform systematic, quantitative comparisons of MFS structures with the final aim of producing a classification of metal sites. This classification does not depend on the overall metalloprotein fold and describes structural variability of MFSs within a metalloprotein family. Furthermore, it indicates possible relationships between different metalloprotein families binding the same metal cofactors.
The present computational protocol organizes MFSs into clusters in such a way that each cluster contains sites that are structurally similar to each other and differ from sites of the other clusters. The procedure uses a hierarchical agglomerative clustering algorithm to obtain a structure-based classification. In agglomerative clustering every individual object is initially considered as a singleton (i.e., a cluster containing only one member). Then the clusters are iteratively grouped by merging the two clusters at the shortest “distance”, i.e. the most similar pair. For the present work, the distance measure adopted was the global MetalS2 score, which increases with increasing structural diversity. Two merged clusters become one cluster, so after each iteration there is one less cluster. The iterations are repeated until all objects are collected into a single cluster. The result of hierarchical clustering is a nested sequence of partitions, with a single, all-inclusive cluster at the top and all singleton clusters at the bottom. Each intermediate cluster can be viewed as a combination of two clusters from the lower level or as a part of a split cluster from the higher level.
Hierarchical clustering methods differ in the way they merge clusters (linkage methods). Although all methods merge the two “closest” clusters at each step, they determine differently the distance between clusters, i.e., have different metrics to compare one cluster to another. Here we used both the complete and average linkage methods (12). For complete linkage the distance between a pair of clusters corresponds to greatest distance from any member of one cluster to any member of the other cluster. In the average linkage method the distance between two clusters is the average of the distances between all the members in one cluster and all the members in the other. The final clusters are defined by cutting the nested sequence of partitions at a certain threshold. The clusters are considered to be separate if the distance between them (a value of the global MetalS2 score) is bigger than this threshold. The value of the threshold also determines the extent to which the objects are similar within each cluster (the diameter of the clusters, which however is affected by the linkage type used).