Identification of lineage-specific gene family expansions in a database of gene families
Gene families specific to, or with significantly changed membership in, particular lineages compared to outgroups may reflect important lineage-specific changes in biology. Here we describe a computational protocol to identify gene families that vary greatly in gene count across a species tree. This protocol uses three different metrics to capture aspects of this variability, and calculates them for each family in an in-house database of gene families (e.g. built using the Ensembl Compara pipeline). One metric (Cv) identifies families that vary a lot in gene count across the species tree, and the other two (Emax, Zmax) identify families that have an elevated gene count in a certain clade of the species tree. Our protocol controls for differences in gene counts due to fragmented assemblies.
Figure 1
Figure 2
Figure 3
Posted 14 May, 2018
Identification of lineage-specific gene family expansions in a database of gene families
Posted 14 May, 2018
Gene families specific to, or with significantly changed membership in, particular lineages compared to outgroups may reflect important lineage-specific changes in biology. Here we describe a computational protocol to identify gene families that vary greatly in gene count across a species tree. This protocol uses three different metrics to capture aspects of this variability, and calculates them for each family in an in-house database of gene families (e.g. built using the Ensembl Compara pipeline). One metric (Cv) identifies families that vary a lot in gene count across the species tree, and the other two (Emax, Zmax) identify families that have an elevated gene count in a certain clade of the species tree. Our protocol controls for differences in gene counts due to fragmented assemblies.
Figure 1
Figure 2
Figure 3
© Research Square 2021