A new coronavirus named Severe Accute Respitory Syndrom Coronavirus 2, or SARS-CoV–2, caused an outbreak of pulmonary disease in the city of Wuhan in China at the end of 2019 which has since spread into a world pandemic disease called COVID–191–4. A significant research push is now underway to repurpose existing drugs and to design new therapeutic agents targeting various components of the virus.5 The viral single-stranded RNA genome is 82% identical to the earlier SARS coronavirus (SARS-CoV) with some viral proteins being more than 90% homologous to SARS-CoV.6 SARS-CoV–2, similar to many other single-stranded RNA viruses, employs a chymotrypsin-like protease (3CL main protease, or 3CL Mpro) to enable the production of non-structural proteins essential for viral replication.7–9
3CL Mpro cleaves two large overlapping polyproteins pp1a and pp1ab at at least 11 conserved sites, including its own N- and C-terminal autoprocessing sites. The enzyme has a recognition sequence of Leu-GlnSer-Ala-Gly, where marks the cleavage site, but shows sequence promiscuity. The absolute dependence of the virus on the correct function of this protease, together with the absence of a homologous human protease, makes 3CL Mpro an attractive, albeit difficult, target for the design of specific protease inhibitors.10 Unfortunately, to date, no protease inhibitors targeting SARS-CoV 3CL Mpro have been FDA-approved, despite significant research effort during the past fifteen years.11–17
The 3CL Mpro structure is composed of three domains.18,19 Domains I (residues 8–101) and II (residues 102–184) are composed of antiparallel β-barrel structures and are the catalytic domains. Domain III (residues 201–303) is composed of five α-helices and is responsible for the enzyme dimerization. This helical domain plays an essential role in the protease function as the monomeric enzyme is not catalytically active. Thus, 3CL Mpro forms a functional dimer through intermolecular interactions, mainly between the helical domains (Figure1a).
3CL Mpro is uniquely diversified to have an unconventional Cys catalytic residue. Unlike other chymotrypsin-like enzymes and many Ser (or Cys) hydrolases, it has a catalytic Cys-His dyad instead of a canonical Ser(Cys)-His-Asp(Glu) triad.8 The catalytic residues Cys145 and His41 in 3CL Mpro are buried in an active site cavity located on the surface of the protein. This cavity can accommodate four substrate residues in positions P1’ through P4, and it is flanked by residues from both domains I and II (Figure 1b).
We present here new atomic details pertinent to the function and inhibitor binding to SARS-CoV–2 3CL Mpro. To gain these insights we determined a room temperature (293K) X-ray structure of the enzyme to 2.30 Å resolution by growing large crystals (Figure S1) that could be used on a home source to ensure minimal radiation damage. In our structure of ligand-free 3CL Mpro, the catalytic Cys145 S is 3.8 Å from His41 N2, which appears to be too long for the formation of a hydrogen bond (Figure 2). This is not surprising, taking into account the experimental pKa values of 8.0 ± 0.3 for Cys145 and 6.3 ± 0.1 for His41 measured previously for the SARS 3CL Mpro that shares 96% homology with the SARS-CoV–2 enzyme20,21 and the poor hydrogen bonding properties of thiols. Thus, in our crystallization conditions (see Mehtods) at the pH in the crystallization drop of 7.0, both catalytic residues are expected to be uncharged adopting the enzyme’s resting state.
In this resting state, the thiol of Cys145 is protonated and the imidazole of His41 is neutral, and the catalytic dyad would be activated by a proton transfer from Cys145 to His41 possibly triggered by substrate binding or occurring in a transition state during the attack by the sulfur on the carbonyl carbon atom of the scissile peptide bond. Conversely, His41 makes a strong hydrogen bond with a water molecule (speculatively named H2Ocat), which in turn is stabilized through hydrogen bonds of 2.9 and 3.0 Å with the side chains of Asp187 and His164, respectively. The position of Asp187 is further stabilized through a salt-bridge with the nearby residue Arg40.
H2Ocat is involved in a complex network of interactions, mediating polar contacts between the catalytic His41, a conserved His164, and a conserved Asp187 located in the domain II-III junction. It is not unreasonable to suggest that this water may play a role of the third catalytic residue, completing the non-canonical catalytic triad in 3CL Mpro and acting to stabilize the positive charge on His41 by mediating its electrostatic interaction with the negatively charged Asp187 during catalysis. We note that in some X-ray structures of the ligand-free 3CL Mpro from SARS-CoV–2 (e.g., PDB ID 6M03) obtained at 100K, this potentially crucial water molecule is absent.
Unsurprisingly, a significant number of reports have now appeared in which 100K X-ray structures of the ligand-free 3CL Mpro have been used for molecular docking simulations of various small molecules, including many of the therapeutics approved to treat other diseases. We superimposed our room temperature structure of 3CL Mpro with one obtained at 100K (PDB ID 6Y2E).18 While the overall structures are similar with an R. M. S. D. for Cα atoms of 0.32 Å (Figure 3a). The conformation of residues 192–198 differs between the room temperature and 100K structures (Figure 3b). The peptide bond of Ala194 is flipped in the room temperature structure pointing inwards into the P5 inhibitor binding pocket where it adopts a conformation similar to that seen in 3CL Mpro in complex with the with inhibitor N3 (PDB ID 6LU7).19 Residues Thr196 and Asp197 also differ significantly in their conformations between the room temperature and 100K structures. The backbone carbonyl oxygen atom of Thr 196 differs in position by 1.3 Å, the CG atoms of Asp197 are separated by 1.9 Å, and the position of backbone carbonyl oxygen atoms of Asp 197 differs in position by 2.6 Å. The conformations observed in the ligand-free enzyme at room temperature may be more relevant for screening of possible drug candidates.
It is also instructive to compare our room temperature structure of the protease with the structure of an inhibitor-bound complex. For this comparison, we chose the complex with a structurally long peptidomimetic inhibitor N319 because it has substituents spanning all substrate binding subsites, including substituents at positions P4 and P5, thus closely resembling an actual substrate. Figure 4 shows the superposition of the two structures. The structural comparison reveals significant structural plasticity of the enzyme in the vicinity of the active site. To accommodate the inhibitor several secondary-structure elements move by more than 1 Å away from their positions in the room temperature structure of the ligand-free form. Such conformational changes can be characterized as induced fit due to ligand binding.
On ligand binding, the small helix near P2 group containing residues 46–50 and the β-hairpin loop near P3-P4 substituents with residues 166–170 shift apart by 2.4 Å, whereas the P5 loop spanning residues 190–194 moves closer to the P3-P4 loop. Two methionines, Met49 and Met165, avoid clashing with the inhibitor’s leucine at position P2 by altering their side-chain conformations in the structure of the complex. Further, the change in Met49 conformation cascades to changes in the side chain positions of Ser46 and Leu50. More dramatic conformational changes due to inhibitor binding occur at the enzyme’s C-termini. Unexpectedly, the C-terminal tail consisting of residues Ser301 through Gln306 swings 180 from its position in the room temperature ligand-free structure and is situated above the helical domain in the N3 inhibitor-bound form (Figure S2).
The drastic flip in the C-terminal loop conformation eliminates several hydrogen bonds made as part of the dimer interface in the ligand-free form, which may destabilize the dimer in the inhibitor-bound form to a certain degree. To assess the flexibility of these enzyme regions we performed a 1 s molecular dynamics (MD) simulation of the ligand-free 3Cl Mpro. As shown in Figure S3, in our MD simulation the same regions, including the P2 helix (residues 45–50), the P5 loop (residues 190–194) and the C-terminal tail are the most dynamic, showing the largest root-mean square fluctuations (RMSF) (Figure S3). Therefore, these structural regions are quite malleable, possibly able to accommodate various chemical groups at the P2-P5 sites of inhibitors.
The conformational flexibility of the enzyme active site detected by comparisons between the room temperature ligand-free structure reported here with the low-temperature ligand-free and inhibitor-bound structures previously reported leads us to suggest that room-temperature structures of the 3CL Mpro ligand-free form may be the more physiologically relevant structure for performing molecular docking studies to estimate drug binding and enable drug design.