Comparative analyses of metabolism of various species leads to better understanding of similarities and differences in biology among them. This could lead to insights into their specific niche-related adaptations. Moreover, when contrasted with host organism’s metabolism, such comparisons are likely to yield important insights that may be leveraged to specifically target parasites.
See figure in Figures section.Figure 1. Overview of the protocol. The dashed lines represent an optional step to ascertain lenient completion of modules for applications sensitive to false negatives.
As shown in Figure 1, many different approaches can be taken for metabolic comparisons. In this protocol we describe multiple different analysis, especially in context of the work done for the 50 helminth initiative. Since the primary objective was to see a large scale overview of metabolic potential for different groups of helminths, many of the approaches included a phylogenetic analysis at the end.
Pathway Tools package1 was used to reconstruct individual metabolic network of each species based on reference pathways in the Biocyc database2. These can then be analyzed to discover vitamin and amino acid auxotrophies.
KEGG database3 also has reference pathways. Many of these are relevant to helminth biology. These pathways can be reconstructed using the input ECs (Enzyme Commission numbers) for each species. These species-specific metabolic networks can then be compared either based on overall coverage (i.e. % of enzymes present), or diversity of extent of coverage in a species group. The networks can also be analyzed to identify species-specific chokepoint enzymes. A chokepoint in a directed network is a node with either a single outgoing or a single incoming edge. This means that the chokepoint enzyme either uniquely consumes or uniquely produces a substrate. This makes the enzyme an especially interesting drug target. The identified chokepoints can then be used to generate a phylogenetic tree to see whether closely related species share chokepoints in general.
KEGG also defines smaller networks, which may be part of multiple larger pathways. These are called “metabolic modules”. Since these are relatively smaller, they are more amenable to topological analysis. We use every species’ enzyme annotations to find which KEGG metabolic modules are “complete” in the species (i.e. every enzyme needed to produce the final substrate, if given the initial substrate(s), is annotated). This analysis follows Tyagi et al.4. The identified complete modules can then be used to generate a phylogenetic tree which can be used to identify any unexpected evolutionary patterns in the metabolic potential of helminths. Since every single indispensable enzymatic step in a module needs to be present for the module to be deemed complete, this analysis is especially sensitive to false negatives. To guard against this, a “lenient completion” is also analyzed, which allows absence of up to 1 enzyme potentially due to misannotation or missed genecalls4. Any phylogenetic peculiarities identified using module completion that are also supported by lenient completion analysis are likely to be true.
Using just the EC numbers associated with each species, one could directly generate phylogenetic trees which offer a simple overview of metabolic potential evolution. In general, however, the data at this level is more noisy and more insights are obtained by looking at the data at the module or pathway level.