G protein-coupled receptors (GPCRs) are a superfamily of plasma membrane receptors that have diverse functions in parasitic worms (helminths), including neuromuscular signaling, chemosensation, and development. As GPCRs are the most popular class of drug targets in humans, they have been implicated as possible next-generation targets for anthelmintics. Thus, identification and annotation of GPCRs in helminth genomes is often an immediate priority. Here, we present a computational pipeline for GPCR identification and annotation that leverages comparative genomics performed in the recent release of over 50 helminth genomes.
The most robust methods for GPCR identification often use structural information alone as a first-pass identification of putative GPCRs. GPCRs have a canonical structure that includes seven transmembrane regions, and as such they are readily identified in genomic open-reading frames using pan-genome transmembrane domain predictions. However, this is computationally intensive for a single genome, let alone >100 genomes. It is also not optimized for highly fragmented genomes that have confounded assembly by high concentrations of repetitive regions and high AT content – features that are often found in helminth genomes.
Thus, we devised an alternative strategy that leveraged the lucrative comparative genomic data included at WormBase ParaSite. Using previously identified helminth and free-living worm GPCRs as seeds, we developed a homologous family-centric approach for identifying conserved GPCR families and filtering out false-positives (Figure 1
See figure in Figures section.).