Data from: Worth the fuss? Maximising informativeness for target capture-based phylogenomics in Erica (Ericaceae)
Plant phylogenetics has been revolutionised in the genomic era, with target capture acting as the
primary workhorse of most recent research in the new field of phylogenomics. Target capture (aka
Hyb-Seq) allows researchers to sequence hundreds of genomic regions (loci) of their choosing, at
relatively low cost per sample, from which to derive phylogenetically informative data. Although
this highly flexible and widely applicable method has rightly earned its place as the field’s de facto
standard, it does not come without its challenges. In particular, users have to specify which loci
to sequence—a surprisingly difficult task, especially when working with non-model groups, as it
requires pre-existing genomic resources in the form of assembled genomes and/or transcriptomes.
In the absence of taxon-specific genomic resources, target sets exist that are designed to work
across broad taxonomic scales. However, the highly conserved loci that they target may lack
informativeness for difficult phylogenetic problems, such as that presented by the rapid radiation
of Erica in southern Africa. We designed a target set for Erica phylogenomics intended to
maximise informativeness and minimise paralogy while maintaining universality by including genes
from the widely used Angiosperms353 set. Comprising just over 300 genes, the targets had
excellent recovery rates in roughly 90 Erica species as well as outgroups from Calluna, Daboecia,
and Rhododendron, and had high information content as measured by parsimony informative
sites and Quartet Internode Resolution Probability (QIRP) at shallow nodes. Notably, QIRP was
positively correlated with intron content, while including introns in targets—rather than recovering
them via exon-flanking “bycatch”—substantially improved intron recovery. Overall, our results
show the value of building a custom target set, and we provide a suite of open-source tools that
can be used to replicate our approach in other groups (github.com/SethMusker/TargetVet).