DNA metabarcoding is a promising method for describing communities and estimating biodiversity. This approach uses high-throughput sequencing of targeted markers to identify species in a complex sample. By convention, sequences are clustered at a predefined sequence divergence threshold (often 3%) into operational taxonomic units (OTUs) that serve as a proxy for species. However, variable levels of interspecific marker variation across taxonomic groups make clustering sequences from a phylogenetically diverse dataset into OTUs at a uniform threshold problematic. In this study, we use mock zooplankton communities to evaluate the accuracy of species richness estimates when following conventional protocols t... More
DNA metabarcoding is a promising method for describing communities and estimating biodiversity. This approach uses high-throughput sequencing of targeted markers to identify species in a complex sample. By convention, sequences are clustered at a predefined sequence divergence threshold (often 3%) into operational taxonomic units (OTUs) that serve as a proxy for species. However, variable levels of interspecific marker variation across taxonomic groups make clustering sequences from a phylogenetically diverse dataset into OTUs at a uniform threshold problematic. In this study, we use mock zooplankton communities to evaluate the accuracy of species richness estimates when following conventional protocols to cluster hypervariable sequences of the V4 region of the small subunit ribosomal RNA gene (18S) into OTUs. By including individually tagged single specimens and "populations" of various species in our communities, we examine the impact of intra- and interspecific diversity on OTU clustering. Communities consisting of single individuals per species generated a correspondence of 59-84% between OTU number and species richness at a 3% divergence threshold. However, when multiple individuals per species were included, the correspondence between OTU number and species richness dropped to 31-63%. Our results suggest that intraspecific variation in this marker can often exceed 3%, such that a single species does not always correspond to one OTU. We advocate the need to apply group-specific divergence thresholds when analyzing complex and taxonomically diverse communities, but also encourage the development of additional filtering steps that allow identification of artifactual rRNA gene sequences or pseudogenes that may generate spurious OTUs.