Estimation of species range fragmentation

Estimation of species range fragmentation

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

While discussing species distribution patterns along spatial, temporal or functional gradients one often finds a hump shaped species richness pattern. This is well documented in many taxa and spatial scales and known as the mid-domain-effect (e.g. Colwell & Lees 2000 ).

These patterns also arise stochastically (depending on the species range frequency distribution). So random distribution of species ranges midpoints are used as null models to test for these mid-domain-effects.

However, these models assume unfragmented species ranges. With a highly fragmented species range the pattern would look quite dissimilar.

So I don't want to start a discussion on the validity of this concept but rather ask:

Given a gradient and known species occurences on that gradient. Has anyone read an article about (or has an idea) what would be a good/unbiased* estimator for species range fragmentation?


*e.g: One that doesn't over- or underestimate fragmentation for rare or abundant species

Very interesting question +1. I don't know the literature on the subject very well but I did not find much by looking for it. I know that a number of methods exist when you have genetic data (STRUCTURE or some of the work of J. Novembre probably).

Here are two possible solutions

Fitting $x$ distributions

You could be fitting 1, 2, 3,… , n normal (or uniform) distributions to the observed data. Each time, compare their maximum likelihood (for which you might need a MCMC with 2⋅x parameters, where $0≤x≤n$ is the number of distributions you're fitting) and select the "best" model with some information criterion such as AIC or BIC.

The number of fragments is just the $x$ value associated with the lowest AIC.

Logistic regression

Another (faster and simpler) solution would be to fit a logistic regression to your data.

Iteratively fit a logistic regression of degree 1,2,3,… n and then again use some information criteria to select the 'best'.

To find the number of fragments, you can then either use the number of degrees in the model or even better use some threshold on the the probability of getting a zero (which you could compute with the package effects in R).


It will probably takes you a day or so if you're at ease with these methods and have intro knowledge in programming.

You might want to get opinions from stats.SE as well.

Impact of cost distance and habitat fragmentation on the daily path length of Rhinopithecus bieti

An understanding of primate movement patterns in response to natural and anthropogenically induced changes in habitat heterogeneity, food availability, and plant species distribution is essential for developing effective management and conservation programs. Therefore, from July 2013 to June 2014, we examined the effects of landscape configuration on the ranging behavior (daily path length, DPL) of the Endangered Yunnan snub-nosed monkey (Rhinopithecus bieti) in the Baimaxueshan National Nature Reserve (27°34'N, 99°17'E) in Gehuaqing, China. Given the extreme difficulties in following the study group across high altitude mountainous terrain across an elevation of 2,500-4,000 m, we were only able to collect DPL using 3-4 GPS points per day on 21 individual days. We found that R. bieti traveled the shortest DPL in winter (1,141.31 m), followed by spring (2,034.06 m) and autumn (2,131.19 m). The cost distance, a statistical tool designed to estimate the difficulty of a species moving across its distributional range, was lowest in autumn (205.47), followed by spring (225.93) and winter (432.59) (one-way ANOVA: F = 3.852, P = 0.026, df = 2). The habitat fragmentation index (HFI), which measures the density of forest patches, indicated areas visited in the winter were more fragmented (HFI = 2.16) compared to spring (HFI = 1.83) or autumn (HFI = 1.3). Although our results should be considered preliminary, they suggest that both the availability of suitable travel routes and habitat fragmentation, driven by high-intensity human disturbance, constrain the movement of R. bieti. We found that undisturbed areas of the bands' range contained a high density of lichens, which represent a nutritious and abundant and year-round food source for Yunnan snub-nosed monkeys. In order to protect this Endangered species, we recommend that researchers construct detailed maps of landscape heterogeneity, particularly habitat connectivity, forest fragmentation, and seasonal variation in the location of major food patches in order to better understand and mitigate the effects of seasonal habitat change on patterns of R. bieti habitat utilization and population viability.

Keywords: Cost-distance model Daily path length Habitats fragmentation Human disturbance Landscape heterogeneity Primates Ranging behavior Rhinopithecus bieti Seasonal variation of habitat Spatial analyst.

Conflict of interest statement

The authors declare there are no competing interests.


Figure 1. Study area, vegetation distribution and…

Figure 1. Study area, vegetation distribution and the activity points of R. bieti in different…

Data Requirements

The types of data required to achieve inventory or monitoring objectives should be the primary consideration in selecting field techniques. Four categories of data collection are discussed below along with some suggestions for electing appropriate field techniques for each.

Occurrence and distribution data

For some population studies, simply determining whether a species is present in an area is sufficient for conducting the planned data analysis. For example, biologists attempting to conserve a threatened salamander may need to monitor the extent of the species’ range and degree of population fragmentation on a land ownership. One hypothetical approach is to map all streams in which the salamander is known to be present, as well as additional streams that may qualify as the habitat type for the species in the region. To monitor changes in salamander distribution, data collection could consist of a survey along randomly selected reaches in each of the streams to determine if at least one individual (or some alternative characteristic such as egg mass) is present. Using only a list that includes the stream reach (i.e., the unique identifier), the survey year, and an occupancy indicator variable, a biologist could prepare a time series of maps displaying all of the streams by year and distinguish the subset of streams that were known to be occupied by the salamander. Such an approach could support a qualitative assessment of changes in the species distribution pattern, thereby attaining the program’ objectives, and generate new hypotheses as to the cause of the observed changes.

It is far easier to determine if there is at least one individual of the target species on a sampling unit than it is to count all of the individuals. Determining with confidence that a species is not present on a sampling unit also requires more intensive sampling than collecting count or frequency data because it is so difficult to dismiss the possibility that an individual eluded detection. Probability of occurrence can be estimated using approaches such as those described by MacKenzie and Royale (2005). MacKenzie (2005) offered an excellent overview for managers of the trade-off between number of units sampled per year and the number of years (or other unit of time) for which the study is to be conducted. The variation in the estimated trend in occupancy decreases as the number of years of data collection increases (Fig. 8.1). A similar level of precision can be achieved by surveying more units over fewer years vs. surveying fewer units over a longer period.

Figure 8.1. Simulation-based coefficient of variation for estimated trend in occupancy (on the logistic scale) where 50, 100, or 200 landscapes are each surveyed 3 times per season, for multiple seasons (redrafted from MacKenzie 2005). Estimates of occupancy can be facilitated by use of computer programs such as PRESENCE (MacKenzie et al. 2003).

Population size and density

National policy on threatened and endangered species is ultimately directed toward efforts to increase or maintain the total number of individuals of the species within their natural geographic range (Suckling and Taylor 2006). Total population size and effective population size (i.e., the number of breeding individuals in a population Lande and Barrowclough 1987) most directly indicate the degree of species endangerment and effectiveness of conservation policies and practices. Population size or more accurately density per unit area is usually used as the basis for trend analyses because changes in density integrate changes in natural mortality, exploitation, and habitat quality. In some circumstances, it may be feasible to conduct a census of all individuals of a particular species in an area to determine the population density. Typically however, population size and density parameters are estimated using statistical analyses based on only a sample of population members. Population densities of plants and sessile animals can be estimated from counts taken on plots or data describing the spacing between individuals (i.e. distance methods) and are relatively straightforward. Population analyses for many animal species must account for animal response to capture or observation, observer biases, and different detection probabilities among sub-populations. Pilot studies are usually required to collect the data necessary to address these factors in the analysis. Furthermore, mark-recapture studies, catch-per-unit effort surveys, and other estimation methods require multiple visits to sampling units (Pradel 1996). These considerations increase the complexity and cost of studies designed for population parameter estimation.

Abundance indices

The goals and objectives of some biological inventories and monitoring studies can be met with indices of population density or abundance, rather than population estimators. The difference between estimators and indices is that the former yield absolute values of population density while the latter provide relative measures of density that can be used to compare indices to populations among places or times. Indices are founded on the assumption that index values are closely associated with values of a population parameter, although the precise relationship between the index and parameter usually is not quantified. Examples of abundance or density indices are: plant canopy cover, numbers of individuals captured per 1000 trap nights, counts of individuals observed during a standardized unit of time, among many others. From a data collection perspective, density indices often require less sampling intensity and complexity than population estimation procedures. However, population indices are not comparable among different studies unless field techniques are strictly standardized. Furthermore, the assumption that an abundance index closely approximates population density is rarely tested (Seber 1982).

Fitness data

For rare or declining populations, estimates of survival in each life stage as well as reproductive rates are required. These data not only provide useful trigger points for estimating rates of decline (lambda) they also allow trigger points for removal of a species from a threatened or other legal status. Collecting these sorts of data is often labor intensive and expensive. In a study on northern spotted owls, for instance, millions of dollars have been spent collecting these types of data (Lint 2001). This is not particularly surprising as the types of data that would be necessary to understand the population dynamics of a bird are numerous and complicated to generate. Nest densities, clutch sizes, hatching rates, fledging rates, and survival rates to maturity and survival rates as reproductive adults would be a minimum data set. New approaches to estimating individual contributions to population growth and changes in distributions of quantitative traits and alleles include genetic analyses, which can lead to even more detailed understanding of the potential for a population to adapt to variations in environmental factors (Pelletier et al. 2009).

Research studies

Studies of habitat relationships or cause-and-effect responses require coordinated sampling of the target population and environmental measurements or stressors to which the population may respond. Data collection efforts tend to be complex, requiring multiple sampling protocols for the target population, study site attributes, and landscape pattern metrics. The funding required to conduct research studies typically limits their application to species or populations in greatest need of management planning such as those listed as threatened or endangered. Manipulative studies are often carried out to generate the necessary data, but when these focus on a threatened species, ethical questions regarding the conduct of the experiment placing the species at even great risk, at least locally, often emerge. Hence it is often monitoring of both environmental conditions and aspects of population density or fitness that are used to assess associations in trends between population parameters and environmental parameters.


High-quality reference genome assemblies for H. hermathena and H. nattereri

We first generated high-quality H. hermathena and H. nattereri genome assemblies to facilitate phylogenetic and population genomic analyses using Illumina paired-end and mate-pair sequencing (Table 1 Tables S1-S2, Additional file 1 the “Materials and methods” section). The final H. hermathena assembly comprised 392 Mb in 1913 scaffolds with an N50 of 560 kb, while the H. nattereri assembly comprised 276 Mb in 261 scaffolds with an N50 of 8.8 Mb (Table 1). These values were consistent with the size and heterozygosity estimates of 373 Mb and 0.0057 for H. hermathena and 258 Mb and 0.0075 for H. nattereri, respectively, from analyses of 21-mer frequencies in the raw sequencing data [34]. Furthermore, both genomes were predicted to be among the most complete and least redundant nymphalid assemblies available based on the presence and completeness of universal single-copy orthologs assayed using BUSCO [35] (Table 1).

Phylogenetic placement of H. hermathena and H. nattereri

Heliconius consists of two major clades with unique characteristics: the erato-sara clade and the melpomene-silvaniform clade (Fig. 1). Morphological studies have placed H. hermathena within the erato-sara clade [25] and H. nattereri within the melpomene-silvaniform clade, but molecular phylogenetics results have been unclear about the fine-scale placement of species. Beltrán et al. [30] used four autosomal and four mitochondrial genes while Kozak et al. [36] used 20 autosomal and three mitochondrial genes to infer Heliconius species relationships. These studies placed H. nattereri and H. ethilla as sister species with low statistical support and H. hermathena as a polytomy with H. himera and H. erato. In a recent pre-print, Kozak et al. [37] used genome-wide SNP calls relative to H. melpomene to reconstruct the Heliconius species tree and found H. hermathena as sister to H. erato and H. himera they did not include H. nattereri. However, hybridization and gene flow are common within the major Heliconius clades, confounding estimation of species relationships using standard molecular phylogenetic methods based on few loci or a single reference genome [15, 20, 37].

To resolve the placement of H. hermathena and H. nattereri within Heliconius, we performed whole-genome alignments and reconstructed species trees based on genome-wide data following Edelman et al. [20]. Edelman et al. [20] inferred the relationships among 13 Heliconius species by aligning de novo-assembled genomes using the progressiveCactus alignment pipeline [38, 39], inferring gene trees for short non-overlapping aligned regions, and summarizing those gene tree topologies using ASTRAL [40, 41]. This approach therefore largely avoids biases introduced by using a single reference genome or a small number of gene trees and therefore more fully captures the different relationships among different genome regions due to gene flow [20]. Support for each branch in the species tree is calculated as the fraction of gene trees that include particular four-taxon topology (ASTRAL’s quartet score) [40]. We aligned our new reference genomes and a new high-quality Heliconius charithonia reference assembly that we generated using publicly available data (Table 1 the “Materials and methods” section) to the multi-species alignment produced by Edelman et al. [20] using progressiveCactus [38, 39]. We then constructed maximum likelihood (ML) trees using autosomal, non-overlapping 10-kb windows and used these trees to infer the Heliconius species tree with ASTRAL-iii (Fig. 1 Figure S1–S10, Additional file 2) [38,39,40,41,42]. The final species tree was inferred using 18 Heliconius genomes and 8674 windows (Fig. 1b). We corroborated these results with smaller coding and non-coding alignment blocks, where the effects of intra-alignment recombination are limited within the melpomene-silvaniform and erato-sara clades separately and using only Z-linked windows, as the Z is generally more resistant to gene flow and may better represent the true relationships between the species (Figure S2-S10, Additional file 2 the “Materials and methods” section) [20].

All of these analyses placed H. hermathena as sister to H. erato and H. himera (Fig. 1b Figure S7-S10, Additional file 2). The branches joining these species together and separating H. hermathena from the other two had quartet scores > 0.89 in analyses of autosomal and Z-linked windows (Fig. 1b Figure S8-S9, Additional file 2), and somewhat lower scores when using the shorter coding and non-coding blocks (> 0.54 Figure S7, Additional file 2). These results are consistent with the results of Kozak et al. [37] based on genome-wide SNP data. The primary source of discordance (i.e., low quartet scores) in the erato-sara clade is the hybridization event or events between the H. sara/H. demeter ancestor and H. hecalesia described by Kozak et al. [37] and Edelman et al. [20] this event was reflected by the different relationships between these species in the predominant 10-kb window topologies (Figure S11, Additional file 2) in our analyses.

The placement of H. nattereri remained less certain. First, in contrast to Beltrán et al. [30] and Kozak et al. [36], all of our reconstructions based on autosomal loci place H. nattereri as an outgroup to the remaining members of the melpomene-silvaniform clade (Fig. 1b Figure S7-S10, Additional file 2). Consistent with previous studies, the melpomene clade nested within the silvaniforms, and there was generally low concordance among trees estimated from different regions (Fig. 1b Figure S7-S8, Additional file 2). Second, species tree reconstructions based on Z-linked windows recovered a clade comprising H. nattereri, H. numata, and H. besckei as an outgroup to the remaining melpomene-silvaniform clade species (Figure S9, Additional file 2). Kozak et al. [37] did not include H. nattereri but recovered a similar lineage in their analysis of Z-linked markers that included H. numata, H. besckei, and H. ismenius.

Finally, we assembled and analyzed mitochondrial genomes of 33 Heliconius species to more directly compare our results with previous studies. We assembled a typical

15-kb contig for 31 of 33 species by extracting and assembling 26 new mitochondrial genomes from publicly available sequencing data using NOVOPlasty and seven reference mitochondrial genomes (Table S3, Additional file 1) [43]. We then inferred species relationships using these 33 sequences and ML (Fig. 1c) [42]. The relationships we found were similar to those recovered by the smaller mtDNA analyses in Beltrán et al. [30] and Kozak et al. [36], with differences only in clades that historically have been difficult to resolve such as in the melpomene-silvaniform clade (Fig. 1c). This phylogeny was identical to the one recovered by Kozak et al. [37]. Importantly, H. hermathena was inferred to be sister to H. erato and H. himera. However, in sharp contrast with species tree reconstructions based on genome-wide data, the mtDNA analysis recovered the silvaniforms as a monophyletic clade, with H. nattereri nested within.

Small effective population sizes of H. nattereri and H. hermathena result in high deleterious mutation loads

Both H. hermathena and H. nattereri have patchy, limited distributions that make them difficult to find and study in their natural habitats. Heliconius nattereri in particular is restricted to a few pockets of Atlantic Forest in a narrow region of eastern Brazil, usually only above

500 m elevation, and is already listed as endangered by the IUCN [21,22,23,24]. To better understand the genetic health of these rare species, we estimated their current genetic diversity, historical population sizes, and deleterious mutation loads using population genomic data.

We sequenced the complete genomes of eight H. nattereri individuals from two locations and 71 H. hermathena individuals spanning six subspecies from seven localities (3–19 individuals per site) and used those data to analyze patterns of variation within and between populations (Fig. 1 Table S1, Additional file 1). We first estimated nucleotide diversity per site (π) in populations of H. hermathena, H. nattereri, and their close relatives H. erato and H. melpomene for comparison (Fig. 2 Tables S4-S5, Additional file 1). Heliconius erato and H. melpomene are widespread, abundant species that have well-characterized genetic history and population structure (e.g., [14, 17]). While average π in H. melpomene and H. erato was 0.0197 and 0.0251, respectively, consistent with previous estimates [14, 17, 44], H. nattereri and H. hermathena carry average π of only 0.0072 and 0.0047 (Fig. 2 Table S5, Additional file 1). H. hermathena populations in particular contain little genetic diversity, with average π ranging from only 0.0011–0.0036, consistent with observations of few individuals in any one locality [25]. The species-wide π values are consistent with analyses of 21-mers performed during reference genome assemblies above. We estimated that the current effective population size (Ne) of H. nattereri is

620,000 (327,000–1,385,000), and the current Ne of H. hermathena is

405,000 (214,000–904,000 Table S5, Additional file 1) using the measured H. melpomene mutation rate of 2.9e−9 per site per generation (μ 95% CI 1.3e−9 to 5.5e−9) [45] and the classic estimator π = 4Neμ [46]. These estimates should be viewed with caution, however, as the relationship between π and Ne assumes that populations are at equilibrium and that variant sites are evolving strictly neutrally, assumptions that are not likely to be true (see below).

Current and historical effective population sizes in Heliconius nattereri and H. hermathena. a Nucleotide diversity per site (π) calculated in non-overlapping 10-kb windows across the autosomes. b Tajima’s D statistic calculated in 10-kb non-overlapping windows across the autosomes. cSMC++ analysis results for H. hermathena. dSMC++ analysis results for H. nattereri. Color schemes follow those in Fig. 1a. Geographical assignments for H. erato and H. melpomene subspecies are shown in Table S4 (Additional file 1)

Current Ne estimates reflect the harmonic mean of population sizes over recent history. To better understand the recent history of H. hermathena and H. nattereri, we estimated historical population sizes using the a multi-sample coalescent approach implemented in SMC++ (Fig. 2) [47]. We found a mixture of H. hermathena population size histories (Fig. 2). However, all H. hermathena populations were predicted to have been small and followed similar trajectories until

10,000 years ago, when the southern populations H. h. vereatta and H. h. sheppardi from Manaus expanded quickly. In contrast to H. hermathena, we found that H. nattereri population sizes reached a peak

100,000 years ago and have declined steadily since. The Santa Teresa population in particular has remained small, Ne ≈ 40,000, for the past 30,000 years. The most recent estimates place the Bahia and Santa Teresa population sizes at 424,000 and 40,000, respectively (Fig. 2 Table S4). The declines in H. nattereri and H. hermathena coincide with the end of the last glacial maximum, about 12,000 years ago.

Previous studies of Heliconius population size histories have only used the single-sample coalescent method implemented in PSMC [48]. However, SMC++ is more powerful and able to accurately infer more recent population size changes than PSMC [47]. We include PSMC results for comparison in Figure S12 (Additional file 2). The PSMC and SMC++ results from 10 4 to 10 6 years ago are nearly identical.

We consistently found that H. nattereri and H. hermathena Ne was only 20–25% that of their more widely distributed relatives H. melpomene and H. erato (Fig. 2). Small populations are expected to harbor more slightly deleterious alleles due to the strength of genetic drift relative to natural selection, so we expected H. nattereri and H. hermathena to carry higher deleterious mutation loads than other closely related species with larger effective population sizes. We expected this signal to be especially strong if these species underwent a recent, strong population bottleneck like the one predicted by SMC++ [49, 50]. We therefore calculated the numbers of species-specific substitutions and the site frequency spectrum of derived mutations as qualitative measures of H. nattereri and H. hermathena genetic health.

We first compared substitutions and the frequency spectrum of derived polymorphisms in H. nattereri to those in its close relatives H. pardalinus and H. melpomene (Fig. 3a). We called SNPs for each species relative to the H. melpomene genome, inferred the ancestral state for each site, then inferred the impact of each derived mutation on H. melpomene gene models using snpEff [51]. We then calculated the frequency spectra for neutral and deleterious mutations separately (the “Materials and methods” section). H. nattereri has accumulated a significantly higher fraction of deleterious substitutions (7.5% of all substitutions) than either H. pardalinus (5.9%) or H. melpomene (2.0%) since they diverged from their last common ancestor (χ1 2 > 1272, p < 2e−16 for the three tests Table S6, Additional file 1). Furthermore, H. nattereri harbors an excess of alleles at intermediate and high frequencies, regardless of their impact (Fig. 3). This excess of intermediate and high-frequency alleles further suggests that H. nattereri recently underwent a strong population bottleneck, consistent with the SMC++ analysis (Fig. 2). This bottleneck hypothesis was supported by a high genome-wide average (+ 0.82) and standard deviation (0.66) of Tajima’s D statistic in H. nattereri (Table S5, Additional file 1) [52]. We did not observe these SFS or Tajima’s D patterns in H. pardalinus or H. melpomene, despite those species carrying similar or higher levels of diversity to H. nattereri (0.0077 and 0.015, respectively Fig. 3). Our results therefore quantify the threatened status of H. nattereri at the genetic level.

Deleterious mutation loads in Heliconius nattereri and H. hermathena. a Derived allele frequency spectra in the Santa Teresa H. nattereri population, its close relative H. pardalinus, and H. melpomene melpomene. b Derived allele frequency spectra from H. erato and H. hermathena populations with the largest sample sizes. Dashed red lines indicate expected fractions based on the coalescent

We found similarly skewed substitution proportions, site frequency spectra, and Tajima’s D distributions in H. hermathena, suggesting a recent bottlenecks and high deleterious mutation load (Figs. 2 and 3). H. hermathena contains an excess of both fixed (χ1 2 = 5100, p < 2e−16 Table S7, Additional file 1) and intermediate- and high-frequency deleterious alleles relative to H. erato (χ 2 10 > 3,032,371, p < 2e−16 for all tests). The strength of the skew was inversely related to current Ne estimates based on π (Figs. 2 and 3).

Strong population structure in Heliconius hermathena likely causes small Ne

While there is good evidence that H. nattereri is a sensitive and rare species, it was less clear why H. hermathena exhibits such small population sizes and high deleterious mutation loads. However, H. hermathena comprises seven recognized subspecies from white sand habitats (campina and campinarana) scattered around the Amazon River Basin that few other Heliconius species can tolerate [25,26,27]. This patchy distribution, high habitat fidelity, and observed low dispersal led Brown and Benson [25] to hypothesize that H. hermathena was once widespread but recently fragmented by the expansion of the Amazon rainforest after the last glacial maximum, about 12,000 years ago. We therefore tested whether H. hermathena population fragmentation was contributing to the population genomic patterns we observed.

We first assayed genetic differentiation between H. hermathena populations relative to their widespread relatives using FST. Consistent with the hypothesis that their populations are strongly isolated, we found a strong positive correlation between FST and geographical distance between H. hermathena populations (Fig. 4 Table S8, Additional file 1). The rate at which FST increases with geographical distance is nearly four times higher in H. hermathena than in the more widely distributed H. erato or H. melpomene (Fig. 4 Tables S9-S10, Additional file 1).

The relationship between genetic distance (FST/(1 − FST)) and geographical distance for Heliconius nattereri, H. hermathena, H. melpomene, and H. erato populations, see Tables S1 and S4 for population and geographical location information (Additional file 1)

We next inferred H. hermathena population structure using Admixture and a series of expected numbers of populations (k = 2 to k = 10 Fig. 5 Figure S13, Additional file 2) [53]. Figure 5a shows the results for the number of populations with the lowest to cross-validation error (4) and the number of localities/subspecies we sampled (7). Most individuals were well-differentiated by geographical location (Fig. 5), particularly populations separated by the Amazon River. The most admixed individuals are found in the Faro population, consisting of H. h. duckei and the mimetic H. h. vereatta. This strong population structure was also apparent from a haplotype network constructed from whole mtDNA sequences from these 71 individuals, with the exception of two H. h. sheppardi individuals from Presidente Figueiredo (hher39, hher40) that grouped with H. h. sheppardi from Manaus (Fig. 5 Tables S1 and S11). These results are similar to those based on a mtDNA barcode [27].

Population structure and gene flow between Heliconius hermathena populations. a Admixture analysis results for the most likely k (4) and the actual number of populations sampled (7). All k values tested are shown in Fig. S13 (Additional file 2). b Relative migration rates across the range of H. hermathena, calculated by EEMS. c Bayesian concordance tree generated from 10-kb autosomal windows. Note that both H. h. vereatta (gray) and H. h. duckei (black) are included in the Faro clade, see Fig. S15 for the full tree (Additional file 2). d Mitochondrial DNA haplotype network constructed using whole mtDNA sequences and popArt. Each hash indicates one mutation, and numbers beside branches indicate total numbers of mutations on that branch. The size of each circle is proportional to the number of individuals sharing that mtDNA haplotype. Circles are colored according to the source population

The fact that there exist few admixed individuals in any of the core populations suggests that migration is rare between H. hermathena populations. Brown and Benson [25] noted that H. hermathena do not disperse more than a few hundred meters from their home ranges, probably due to their habitat fidelity and preference for campina and campinarana over deep forest. This finding further supports the existence of strong barriers to gene flow between geographically close H. hermathena populations (Figs. 4 and 5). We next sought to visualize those barriers to gene flow. We estimated effective migration rates between H. hermathena populations using EEMS [54]. In contrast to analyses of pairwise FST, EEMS estimates migration rates across a geographic region using the geographic locations of and genetic similarity estimates between all populations (Fig. 5b Figure S14 [54]). Consistent with Admixture results, the Amazon River and its tributaries appear to provide strong isolating barriers between the northern (H. h. sabinae, H. h. sheppardi, H. h. duckei, and H. h. vereatta) and southern (H. h. hermathena and H. h. curua) populations: the narrow strip of extremely low effective migration rates matched the course of the Amazon despite the fact that EEMS is agnostic to topography. The only pairs of populations predicted to frequently share migrants were H. h. duckei and H. h. vereatta from Faro, which are known to frequently hybridize [25] (RRR, personal observation), and the H. h. hermathena populations from Santarém and Maués, which were indistinguishable in Admixture analyses (Fig. 5a Figure S13, Additional file 2).

Finally, we reconstructed the phylogenetic relationships among H. hermathena populations to begin to understand how H. hermathena originated and became so widespread. We constructed a Bayesian concordance tree using autosomal 10-kb windows (Fig. 5c). Similar to ASTRAL quartet scores, concordance factors (CFs) provide an estimate of the congruence between tree topologies estimated from different genomic windows low CFs may be caused by a variety of factors, including incomplete lineage sorting and gene flow between populations [55]. The concordance tree shown in Fig. 5c suggests that the subspecies groupings are well-supported by whole-genome data (CFs > 0.65), but that geographically adjacent populations are more weakly differentiated (see Figure S15, Additional file 2, for all individuals). Altogether, we find that H. hermathena is fragmented into small discrete populations that appear to rarely exchange migrants. This reconstruction also placed the Faro population as the most basal H. hermathena clade (Fig. 5c).

No genome-wide evidence that H. hermathena is a hybrid species

We were next interested in determining the origin of H. hermathena. Heliconius hermathena displays a unique combination of red and yellow color patterns that led several authors to posit that it may have been formed by hybridization between H. erato and H. charithonia [25, 31, 32]. In particular, only H. hermathena and the distantly related H. charithonia display characteristic rows of submarginal yellow hindwing spots (Fig. 1a). A single putative H. charithonia x H. erato hybrid has been discovered [56], so it is possible that historical hybridization could have generated a new, distinct species. We tested if H. hermathena was formed by hybridization between these H. erato and H. charithonia by calculating the D statistic in 10-kb windows across the autosomes using the species tree (((H. erato, non-mimetic H. hermathena), H. charithonia), H. melpomene) [57, 58]. Significantly positive D would indicate that H. hermathena shares more derived alleles with H. charithonia than expected due to incomplete lineage sorting since the three focal species diverged, and therefore a hybrid origin of H. hermathena. We found D = − 0.015 ± 0.008, suggesting that this is not the case (Table S11, Additional file 1, comparison 5). The Z chromosome exhibited similar values (D = − 0.027 ± 0.034 Table S11, Additional file 1, comparison 5). Furthermore, no 10-kb window trees from our phylogenetic analyses grouped H. hermathena and H. charithonia, and only 24 of 21,247 autosomal trees (0.11%) and no Z trees grouped H. charithonia with H. erato, H. himera, and H. hermathena. The proportion of autosomal trees grouping these four taxa was similar to the proportion of trees grouping H. sara (0.09%) or H. demeter (0.08%) with H. erato, H. himera, and H. hermathena. We therefore found no genome-wide evidence that H. hermathena was formed by hybridization between H. charithonia and H. erato.

Mimetic H. hermathena vereatta originated via introgression from H. erato

Heliconius hermathena is thought to be one of the rare examples of a non-mimetic Heliconius species because its color pattern does not resemble the color pattern of any other co-occurring species [25, 27, 59]. The exception is H. hermathena vereatta, which lacks yellow patterns and mimics co-occurring H. erato hydara and H. melpomene melpomene near the town of Faro [25]. The mimetic color pattern evolved and is maintained in Faro despite frequent interbreeding between H. h. duckei and H. h. vereatta, suggesting that strong natural selection for mimicry preserves the mimetic form (Fig. 1). We therefore searched for loci associated with yellow presence/absence by scanning for genome regions with high allele frequency differences between vereatta and duckei. We found a single narrow peak of FST on chromosome 15 containing the known color patterning gene cortex. This region showed four-fold higher differentiation than the genome-wide average (Fig. 6). Cortex is part of the H. erato Cr (H. melpomene Yb) locus that controls the presence of the yellow hindwing band across Heliconius, and cortex expression patterns in pupal wing discs prefigure adult melanic patterns [10]. Heliconius h. duckei contained significantly lower π (i.e., higher homozygosity) in the FST peak relative to H. h. vereatta (0.0009 vs. 0.0057), suggesting that the vereatta cortex allele is dominant to the duckei allele and melanizes the yellow patterns typical of the other H. hermathena subspecies.

Evolution of the mimetic Heliconius hermathena color pattern. a Genome-wide FST between mimetic H. h. vereatta and non-mimetic H. h. duckei, calculated in non-overlapping 10-kb windows. bFST between H. h. vereatta and H. h. duckei at the chromosome 15 peak, calculated in 5-kb windows (500-bp step). c Genome-wide fd tests for introgression from H. erato hydara into H. h. vereatta, calculated in non-overlapping 10-kb windows. H. melpomene was used as an outgroup. dfd at the chromosome 15 peak. The coordinates of erato-sara clade inversion are also shown (see the “Results” and “Discussion” sections and Fig. S17). e Relative divergence between H. h. duckei and H. h. vereatta to H. erato. a, b Calculated using SNP calls relative to the H. hermathena reference genome while ce calculated using SNP calls relative to the H. melpomene reference genome to reduce reference bias. For consistency, all results are plotted relative to the H. melpomene reference genome, and all coordinates and gene models are based on the H. melpomene genome (the “Materials and methods” section). f Maximum likelihood tree based on the variation in the cortex region with high FST between mimetic and non-mimetic H. hermathena (highlighted yellow in b and d). Collapsed clades contain multiple individuals of the same species/subspecies. Only branches with less than 100% bootstrap support are labeled. H. erato hydara image: Field Museum of Natural History 124251, CC-BY-NC

Heliconius erato populations have evolved similar color patterns by sharing alleles via hybridization distribution, which mediates mimicry between local H. erato and H. melpomene races [12, 60]. This same process operates in H. melpomene, but mimicry between melpomene-silvaniform clade species has also frequently been mediated by introgression of color patterning alleles between co-mimetic species [15, 18, 61,62,63]. We next tested whether H. h. vereatta and H. erato converged on their shared color pattern independently or via introgression (Fig. 6d). We specifically tested whether H. hermathena received its melanic cortex allele from sympatric H. erato hydara using the fd statistic (Fig. 6d) [64]. The patterns of fd mirrored the patterns of FST between H. h. duckei and H. h. vereatta: we found a single narrow peak of fd just upstream of cortex, suggesting that the H. h. vereatta color pattern was formed by introgression of a cortex allele from H. erato into a H. h. duckei-like ancestor (Fig. 6c, d). Consistent with this hypothesis, divergence (dxy) between H. h. vereatta and H. erato was significantly lower in this region than genome-wide or between H. h. duckei and H. erato, and ML analysis of variation in this region grouped H. erato and H. h. vereatta (Fig. 6e, f).

While we detected no genome-wide evidence for hybridization between H. hermathena and H. charithonia, it is also possible that these species share color patterns because they shared color patterning alleles via hybridization. Alternatively, hybridization may be so ancient that its signal has been eroded by continued divergence. However, we found no fd signatures suggesting that cortex alleles have been shared between H. hermathena and H. charithonia (Figure S16). Interestingly, the ML tree in Fig. 6f showed that H. charithonia was placed outside a clade containing its sister species H. demeter and H. sara, and H. telesiphe and H. hecalesia. Edelman et al. [20] showed that there was at least one ancient hybridization event between the ancestors of H. sara/H. demeter and H. hecalesia/H. telesiphe that transferred a large (

500 kb) inversion centered on cortex. The tree in Fig. 6f suggests that H. charithonia carries the ancestral standard allele in this region, and, indeed, we found (1) two large H. charithonia scaffolds (126 kb and 263 kb) that span the inversion breakpoints in the standard orientation and (2) increased dxy between H. charithonia and species carrying the inversion in this region (Figure S17, Additional file 2). Thus, H. erato, H. himera, H. hermathena, and H. charithonia share ancestral non-inverted haplotypes in the cortex region, perhaps suggesting that H. hermathena and H. charithonia yellow patterning alleles arose independently or are controlled by ancestral variation in this region. The mimetic H. h. vereatta color pattern then appears to be a derived phenotype mediated by introgression of a small upstream region of cortex from co-occurring H. erato.


The total area predicted as suitable for each species fluctuated across seasons and years with distinctly different patterns among species. The enormous variation in dynamics across species suggests that the models reflected the different relationship between each species and the environmental variables rather than the variation in particular environmental variables. Plots and animated maps of temporal range size dynamics are provided in Supporting Information. By way of example, the modeled range size for the Scarlet-chested Parrot (Neophema splendida) showed a strong degree of seasonal fluctuation with repeated seasonal minima in March (Fig. ​ (Fig.1a). 1 a). This seasonal fluctuation was overlain with longer term fluctuation in both minima and maxima. Sixteen species (35%) showed such seasonal fluctuations.

Examples of temporal dynamics in geographic range size for birds in arid Australia: (a) Scarlet-chested Parrot, (b) Black Honeyeater, (c) Letter-winged Kite, (d) Yellow Chat, (e) Gibberbird (dotted lines, mean annual rainfall for Australia for the period). (f) Mean annual rainfall (dotted line) relative to mean annual fraction of photosynthetic vegetation (solid line) and mean annual fraction of non-photosynthetic vegetation (dashed line) across Australia from 2000 to 2011.

Not all species showed extreme seasonal effects 27 species (63%) exhibited some seasonal variation superimposed onto more complex dynamics. For instance, the Black Honeyeater (Sugomel nigrum) displayed slight seasonal variation but much stronger and more complex long-term effects (Fig. ​ (Fig.1b). 1 b). At the beginning of the period, which corresponded to high rainfall across interior Australia (2000 to late 2002), the species was predicted to occupy a large area. Notably, the minima in these years exceeded the maxima of later years, and the distribution contracted to a low in January 2010.

Species showed mixed responses to landscape-wide dynamics in rainfall and drought. Letter-winged Kite (Elanus scriptus) ranges contracted dramatically corresponding to landscape-wide drought after 2003 and expanded to postdrought levels at the end of the time series (Fig. ​ (Fig.1c). 1 c). These nocturnal raptors feed on rodents whose populations irrupt after high rainfall events such as those in 2000 to 2002 (Pavey et al. 2008). Recently there has been a spike in records corresponding with the latest rainfall event in 2009 to 2011 (Fig. ​ (Fig.1c 1 c & 1 f) (Pavey & Nano 2013). Six other species showed a similar pattern (Black-shouldered Kite [Elanus axillaris] Spotted Harrier [Circus assimilis] Stubble Quail [Coturnix pectoralis] Mistletoebird [Dicaeum hirundinaceum] Black Falcon [Falco subniger] Budgerigar [Melopsittacus undulatus]). An additional 7 species showed a weaker time-lagged contraction after 2003 with no recovery after 2009 (Grey Honeyeater [Conopophila whitei] Ground Cuckooshrike [Coracina maxima] Grey-headed Honeyeater [Ptilotula keartlandi] Grey-fronted Honeyeater [Ptilotula plumula] White-fronted Honeyeater [Purnella albifrons] Black Honeyeater). Conversely, the habitat for 3 species expanded as the landscape dried out after 2003 (Fig. ​ (Fig.1d 1 d Yellow Chat [Epthianura crocea] Orange Chat [Epthianura aurifrons], and Chestnut-breasted Whiteface [Aphelocephala pectoralis]).

Interestingly, one species, the Gibberbird (Ashbyia lovensis), a species usually described in the literature as nomadic or locally nomadic (Marchant & Higgins 1990), displayed an approximately constant range size even though the location of these areas was dynamic (Fig. ​ (Fig.1e 1 e & Supporting Information).

Some species showed extreme fluctuations between the maximum and minimum range size (Fig. ​ (Fig.2), 2 ), and the magnitude of these fluctuations increased as mean range size decreased. In part this is inevitable because fluctuation of the wider ranging species is limited by the size of the Australian continent. Of the 43 species, 11 showed extreme fluctuation (ϡ order of magnitude) (Table ​ (Table1) 1 ) as defined by IUCN Red List criterion B2cii (IUCN 2014). Trends in environmental suitability fluctuated markedly according to geographic location and position in the species’ range. In the case of the Black Honeyeater, sites in the core of the species range showed little variation in environmental suitability (Fig. ​ (Fig.3b) 3 b) relative to sites at the margin of the species’ geographic distribution (Fig. ​ (Fig.3c 3 c & 3d).

Table 1

Range size and extinction risk metrics for 43 nomadic bird species

Common nameScientific namePooled range size (km 2 )Minimum range size (km 2 )Magnitude of fluctuation in range sizeSatisfies criterion B2 (range size < 2000 km 2 )Satisfies subcriterion B2cii (extreme fluctuation)
Stubble QuailCoturnix pectoralis1,819,376169,0177
Black-shouldered KiteElanus axillaris2,645,411113,30515 yes
Letter-winged KiteElanus scriptus719,69160,45410 yes
Spotted HarrierCircus assimilis3,559,606583,0264
Australian BustardArdeotis australis3,135,9491,123,9192
Common BronzewingPhaps chalcoptera1,097,67286,8796
Flock BronzewingPhaps histrionica916,10784,5548
Diamond DoveGeopelia cuneata2,731,995220,8789
Grey FalconFalco hypoleucos2,572,585882,5582
Black FalconFalco subniger2,675,534537,2303
Major Mitchell's CockatooLophochroa leadbeateri2,404,222560,7303
CockatielNymphicus hollandicus3,270,352106,11118 yes
Bourke's ParrotNeopsephotus bourkii1,657,523746,4962
Scarlet-chested ParrotNeophema splendida496,793776502yesyes
BudgerigarMelopsittacus undulatus2,789,945186,99811 yes
Black HoneyeaterSugomel nigrum2,206,769237,9407
Pied HoneyeaterCerthionyx variegatus2,538,637630,9133
Brown HoneyeaterLichmera indistincta2,571,125138,95812 yes
Painted HoneyeaterGrantiella picta780,03992,9224
Striped HoneyeaterPlectorhyncha lanceolata659,30782,8175
GibberbirdAshbyia lovensis327,149151,1571
Crimson ChatEpthianura tricolor2,611,986157,10713 yes
Orange ChatEpthianura aurifrons2,138,565493,0323
Yellow ChatEpthianura crocea257,08926,5705
White-fronted ChatEpthianura albifrons625,24964,9546
Grey HoneyeaterConopophila whitei1,297,181108,31410 yes
Spiny-cheeked HoneyeaterAcanthagenys rufogularis2,063,826448,0224
White-fronted HoneyeaterPurnella albifrons1,669,300103,53811 yes
Grey-headed HoneyeaterPtilotula keartlandi1,814,667185,3576
Grey-fronted HoneyeaterPtilotula plumula2,210,412255,5985
Striated PardalotePardalotus striatus1,161,005219,5783
Western GerygoneGerygone fusca2,271,607384,7494
Chestnut-breasted WhitefaceAphelocephala pectoralis71,193371720yesyes
Banded WhitefaceAphelocephala nigricincta1,446,464336,6883
Ground CuckooshrikeCoracina maxima3,155,208448,9455
Grey FantailRhipidura albiscapa436,10786,0553
Little CrowCorvus bennetti2,508,7741,111,7112
Jacky WinterMicroeca fascinans1,564,910326,9013
Red-capped RobinPetroica goodenovii2,829,147562,8184
MistletoebirdDicaeum hirundinaceum2,873,533336,3245
Painted FinchEmblema pictum1,494,337350,7683
Plum-headed FinchNeochmia modesta955,39989,2827
Pictorella MannikinHeteromunia pectoralis1,284,73933,22726 yes

Mean modeled geographic range size relative to the magnitude of fluctuation in range size (maximum range size divided by minimum range size) for 43 nomadic species. Those species with fluctuations between minimum and maximum range size of more than one order of magnitude are labeled (AP, Aphelocephala pectoralis NS, Neophema splendida HP, Heteromunia pectoralis NH, Nymphicus hollandicus EA, Elanus axillaris ES, Elanus scriptus LI, Lichmera indistincta ET, Epthianura tricolor MU, Melopsittacus undulatus PA, Purnella albifrons CW, Conopophila whitei).

Theoretical outcome of monitoring abundance of Black Honeyeater across different geographic locations: (a) overall trend and (b) population dynamics at the core and (c-d) edges of the species’ overall range. A linear relationship between environmental suitability and abundance is assumed. Shading bar represents the mean probability that a pixel is environmentally suitable for the species.

The slopes of linear models showed that pooled geographic range size exceeded the minimum geographic range size by 82.6% (95% CI 7.6), mean geographic range size by 58.5% (95% CI 6.6) and maximum geographic range size by 30.4% (95% CI 5.3) (Fig. ​ (Fig.4 4 ).

The relationship between pooled geographic range size and the time sliced (i.e., mapped dynamically across time) estimates of maximum (y𢏀.70x − 2.6 × 10 4 , p < 0.001), mean (y𢏀.40x − 4.4 × 10 4 , p < 0.001), and minimum (y𢏀.17x − 1.6 × 10 4 , p < 0.001) range sizes. Bounding lines indicate 95% confidence intervals.


We aimed to provide a complete recipe for how conservationists may apply metapopulation theory to empirical landscapes and how to interpret the results. Two case studies provide specific examples of what may drive large changes in metapopulation capacity and how conservationists may utilize metapopulation models for prioritization of specific patches. For each species analyzed, the first step established a map of habitat patches of high to moderate quality. Variation in habitat quality requires users to determine a relevant threshold. The goal was to use the maps of habitat to refine species ranges for an accurate description of where individuals may occur. With sufficient care and thought, one could use the results of a species distribution model that produces a binary map of habitat versus nonhabitat. Finally, we input the spatial arrangement of species’ land use into a spatially explicit metapopulation model to generate a summary for the entire landscape.

Spatially Explicit Metapopulation Model

We employed a spatially explicit metapopulation model (Schnell et al. 2013a ), a modified version of that by Hanski and Ovaskainen ( 2000 ). The only change was the inclusion of a self-colonization component that weighs the importance of large patches within a system. We set colonization and dispersal as a function of interpatch distance, , for which we calculated closest edge-to-edge pairwise distances for all patches (i and j) within a system and multiplied this value by the area of the source patch (). The product of the rate of colonization with the inverse of the extinction rate () provided the ratio that described the likelihood a species would occupy a patch or not. We summarized the entire model as a matrix: (1)

The leading eigenvalue (λ) of this matrix, M, provided the metapopulation capacity, analogous to the effective amount of habitat available to a metapopulation (Hanski & Ovaskainen 2000 ) or effective metapopulation size (Schnell et al. 2013a ). Although one could get a basic understanding of the extent of fragmentation of a landscape through 3 main metrics (number of patches, average inter-patch distance, and total area), λ is related to extinction risk directly due to its relation to the extinction threshold set by the ratio of within-patch extinction to colonization rates (details on how these metrics independently influence λ are in Supporting Information).


Unfortunately, the lack of specific information on rates of dispersal for each endemic species forced us to use the same parameters for all avian species. This allowed us to compare the ability of similar landscapes to sustain a species and determine how landscape changes affected species over time.

We ran a sensitivity analysis by calculating the relative change in λ when we varied the α value from 100 m to 5000 m. We based this on work by Martin and Fahrig ( 2018 ) who produced an extensive list of average dispersal distances of bird species from North America and the United Kingdom for a wide variety of taxa. To identify impacts of varying α on species with a variety of range sizes, we ran the sensitivity analysis on 3 different species: Mangrove Hummingbird (Amazilia boucardi) (patch number = 405, total habitat area = 351 km 2 ), Blood-colored Woodpecker (Veniliornis sanguineus) (patch number = 4041, total habitat area = 1357 km 2 ), and a subspecies of the Plain-bellied Emerald (Amazilia leucogaster leucogaster) (patch number = 12,021, total habitat area = 10,256 km 2 ).

Unlike dispersal, the extinction function was relatively similar across taxa, and most species could be accurately parameterized with a value of 0.5 for x (Gilpin & Diamond 1976 Hanski & Ovaskainen 2000 Schnell et al. 2013a , 2013b ).

Identifying Endemic Metapopulations

To identify mangrove endemic bird species, we first selected all avian species that BirdLife International ( 2017 ) classified as using mangroves as a major habitat. We defined these species as mangrove specialists. Although the loss of mangrove habitat would assuredly have a significant impact on the long-term persistence of these species, many of them may also utilize nearby lowland forest and salt marshes. A drawback of our metapopulation approach is that it is based on the assumption that the landscape is binary (habitat and nonhabitat) and that we were modeling the entirety of a metapopulation's patch system. Thus, we further refined the list of mangrove species to those that were obligate mangrove endemics. Starting with the list of specialists, we performed an extensive literature search across a variety of sources, including the Handbook of the Birds of the World and eBird (Luther & Greenberg 2009 Sullivan et al. 2009 del Hoyo et al. 2017 ). With these various sources, we determined if the species habitat requirements restricted their occurrence to mangrove forests and if they are regularly observed in such areas.

As part of this process, we noticed that many species had some subspecies or populations that were mangrove endemics and some subspecies or populations that were not. Consequently, we determined endemicity based on subspecies classifications rather than limiting ourselves to the species level. Taxonomic opinions on allopatric species and subspecies often change.

The last step was to split the species or subspecies ranges into independent metapopulations. Using van Houtan et al.’s ( 2007 ) dispersal function as a guide, we calculated that an individual had a <1% chance of colonizing a patch that was >75 km away. Thus, we delineated independent metapopulations boundaries as a cluster of patches, where every patch was within 75 km of at least one other patch in that cluster. Clusters separated by >75 km (between the 2 nearest patches) were considered distinct metapopulations. If the ranges of 2 or more subspecies were close enough to permit a >1% chance of dispersal between patches, we considered all subspecies involved as a single metapopulation. Given that we delineated metapopulations based on dispersal ability, it is important that users of this method have some confidence that there is little to no exchange of individuals between defined metapopulations, a situation that might not be true for species with large average dispersal distances.

Habitat and Forest Cover Changes

The second required input for a spatially explicit metapopulation model was a detailed map of potential habitat patches. Giri et al. ( 2011 ) provided the most current, accurate, and highest resolution map of global mangrove habitat at a 30-m resolution. Unfortunately, this level of detail was only available globally for the year 2000. To determine how fragmentation changed, we incorporated another source of forest change (Hansen et al. 2013 ). Thus, to model potential habitat for 2015, we overlaid the mangrove data from 2000 (Giri et al. 2011 ) with the Hansen's forest-change data (available through 2017). We removed areas that lost forest from 2000 to 2015. A disadvantage to this approach was that it did not allow us to model potential expansion or reforestation of mangrove habitat since 2000. This approach restricted our results to only decreases in metapopulation capacity. Given the rapid rates of deforestation and the difficulty in accurately identifying mangrove forests from satellite imagery, we believe that this method is the best available at present.

Calculating Metapopulation Capacity and Changes

Once we identified a mangrove-endemic metapopulation, we then refined its range to the habitat patches determined from the forest-cover data. The resulting map depicted the total amount of potential habitat available to the metapopulation. We removed all fragments of <1 ha for computational ease. Although these patches are unlikely to sustain a local population, our method could not account for the use of these patches as stepping stones in facilitating dispersal (Boscolo et al. 2008 ). Using R version 3.4.2 (R Core Team 2018 ) and the rgdal package (Bivand et al. 2018 ), we identified the fragment areas and pairwise nearest edge-to-edge interpatch distances necessary for the metapopulation model (Eq. 1) for years 2000 and 2015 (sample code in Supporting Information). We calculated the resulting λ and the dominant eigenvector of the same matrix the latter of which informs us of each patch's contribution to the overall metapopulation capacity.

After calculating the relative change in λ from 2000 to 2015 for each metapopulation, we summarized the data with 2 methods. The first was a simple summation of the relative changes in λ across all overlapping metapopulation ranges, and the second was an average of these changes across the same overlaps. The first method identified which communities of endemic species declined the most in metapopulation capacity, and the second determined which landscapes were the most affected. Each revealed different metapopulation impacts of habitat fragmentation on endemic communities, an approach we did not see in the literature. We then repeated these 2 summary methods for percent habitat area loss across each metapopulation range (except for one Mangrove Robin [Peneonanthe pulverulenta pulverulenta] metapopulation for reasons we explain below) to compare the results of a metapopulation approach to the more traditional area-only approach.

Results and Discussion

Our analyses reveal that terrestrial mammal species with higher degrees of habitat fragmentation within their ranges are at greater risk of extinction (Fig. 1). Species classified as threatened in the IUCN Red List had higher levels of fragmentation compared with species classified as Least Concern and Near Threatened (phylogenetic generalized linear model β ± SE = −0.16 ± 0.05, z = −2.94, P = 0.003) (Table S1, Upper, model 1). Importantly, degree of fragmentation improved prediction of extinction risk even after accounting for the effects of key macroecological extinction risk predictors such as body size (β ± SE = 0.42 ± 0.03, z = 12.22, P < 0.001) and range size (β ± SE = −0.66 ± 0.03, z = −19.23, P < 0.001) (Table S1, Upper, model 1). Mammals with more fragmented habitat, smaller ranges, and larger body sizes face the highest risk of extinction. The model including fragmentation along with body size and range size had the strongest empirical support from the data, with a model probability of 88% (Table S1, Upper, model 1). This top model was 7.4 times more likely than the next ranked model (Table S1, Upper, model 2), which excluded fragmentation [model probability = 12% ΔAIC (Akaike’s Information Criterion) = 4]. Furthermore, when assuming Near Threatened species face some extinction risk, a conservative and precautionary approach (13), the second-ranked model without fragmentation had very little empirical support (Table S1, Lower, model 2) (model probability < 0.001 ΔAIC = 15) and was 1,808 times less likely compared with the top model including fragmentation (Table S1, Lower, model 1) (model probability = 1.00).

Degree of habitat fragmentation predicts extinction risk for the world’s terrestrial mammals. The fragmentation metric, measuring the amount of core (i.e., interior) habitat distributed within intact high-suitability patches, was ln-transformed and then inverse-coded so high values represent high degrees of fragmentation. Bars represent means and SE (n = 4,018 species). Extinction risk assessed by IUCN Red List threat status. Vulnerable, Endangered, and Critically Endangered species had higher levels of habitat fragmentation compared with Least Concern and Near Threatened species. Similarly, Near Threatened and Data Deficient species had higher levels of fragmentation than Least Concern species (see main text).

Phylogenetic generalized linear model results

Range size was the most important predictor of extinction risk, occurring in all top models (Table S1), consistent with prior findings identifying range size as a key extinction risk correlate (12, 14 ⇓ –16). Data from range size alone, however, can provide misleading information on conservation status, potentially misclassifying naturally narrow-ranging species as threatened and wide-ranging species as nonthreatened (17) and incorrectly assuming species to be homogenously distributed throughout their range (10, 18). Our analyses indicated that fragmentation, consistently in the most supported models, had explanatory power beyond that provided by range size alone. Indeed, our models implicate habitat fragmentation as a potential mechanism underlying the well-known relationship between range size and extinction risk, empirically demonstrating that greater fragmentation in small-ranged species (r = 0.43, phylogenetic generalized least-squares β ± SE = 0.17 ± 0.01, t4,018 = 34.44, P < 0.001) (Table S2, model 1) contributes to elevated extinction risk (Fig. 2 and Fig. S1). Large-ranged species tend to be habitat generalists (19), whereas range-restricted species often have more narrow environmental niches and specialized habitat preferences, characteristics that increase extinction risk (20 ⇓ –22). Specifically, range-restricted specialists are particularly vulnerable to habitat fragmentation given discontinuous distributions, reduced local abundance, and sensitivities to anthropogenic disturbances (23, 24). Habitat fragmentation was not associated with body mass (r = −0.02, phylogenetic generalized least-squares β ± SE = −0.01 ± 0.03, t4,018 = −0.40, P = 0.687) (Table S2, model 4).

Terrestrial mammals with higher degrees of habitat fragmentation and smaller geographic range sizes have a greater risk of extinction. Each black point represents an individual species, with the number of red line segments corresponding to extinction risk according to IUCN Red List threat status: Least Concern, Near Threatened, Vulnerable, Endangered, and Critically Endangered (see legend within figure). Visually, across the scatter plot of all points, more red represents higher extinction risk. Fragmentation and geographic range size (km 2 ) ln-transformed, and the fragmentation metric then inverse-coded so high values represent high degrees of fragmentation. Vertical and horizontal lines represent means (see also Fig. S1).

Phylogenetic generalized least-squares regression model results

Relationship between habitat fragmentation and geographic range size for world’s terrestrial mammals classified according to IUCN Red List threat status: (A) Least Concern, (B) Near Threatened, (C) Vulnerable, and (D) Endangered (small circles) and Critically Endangered (large circles). Fragmentation and geographic range size (km 2 ) ln-transformed, and the fragmentation metric then inverse coded so high values represent high degrees of fragmentation. Vertical and horizontal lines represent means when pooling among all IUCN threat classes (Fig. 2).

Predictably, species with more fragmented habitat had a lower proportion of high-suitability habitat within their range (r = 0.77, phylogenetic generalized least-squares β ± SE = 2.10 ± 0.03, t4,018 = 77.07, P < 0.001) and a lower proportion of high-suitability habitat within protected areas (r = 0.16, β ± SE = 1.46 ± 0.20, t4,018 = 7.48, P < 0.001), further elevating extinction risk. As sole predictors of extinction risk, the model with fragmentation (β ± SE = −0.89 ± 0.06, z = −15.0, P < 0.001 model probability = 1.00) had considerably more explanatory power (ΔAIC = 387) compared with the model with proportion of high-quality habitat (β ± SE = −0.69 ± 0.12, z = −5.57, P < 0.001 model probability <0.001). These findings emphasize the utility of measuring not only the proportion of suitable habitat remaining within the range (reflecting habitat loss per se), but also evaluating how such remaining habitat is distributed within large, intact patches of core habitat, as assessed by our fragmentation metric.

Notably, the relationship between fragmentation and extinction risk remained evident (β ± SE = −0.59 ± 0.08, z = −7.04, P < 0.001) even after excluding threatened species that met IUCN Red List criterion B, used to list species that have restricted geographic ranges (13). Species listed under criterion B have severely fragmented ranges or exist in few locations (subcriteria B1a/B2a), or are undergoing continuing decline (B1b/B2b) or extreme fluctuations in population size or distribution (B1c/B2c) (13). Exclusion of criterion B species avoids potential circularity between our extinction risk modeling and the IUCN criteria adopted to classify extinction risk, thus providing stronger inference regarding the relationship between fragmentation and extinction risk (25 ⇓ –27). Even when excluding criterion B species, the most-supported models still included fragmentation and had the greatest weight of evidence from the data (Table S3, Upper and Lower, model 1), with the remaining models having little to no support.

Phylogenetic generalized linear model results after excluding threatened species that have met IUCN Red List criterion B

Our quantitative measures of fragmentation also allowed evaluation of extinction risk assessments by IUCN experts to evaluate if a taxon belongs in a threatened category, in particular those assessed using subcriteria (B1a/B2a) relating to habitat fragmentation. As expected, threatened species classified under subcriteria B1a/B2a had significantly more fragmentation than threatened species that did not meet the B1a/B2a subcriteria (phylogenetic generalized linear model β ± SE = −1.00 ± 0.13, z = −7.78, P < 0.001). IUCN Red List assessments are based on expert opinion, which can rely on qualitative knowledge, especially for poorly known species. Consequently, assessments of habitat fragmentation for different species can suffer from limited consistency (9). Furthermore, subcriteria B1a/B2a do not distinguish between the two conditions of fragmentation and restricted number of locations. By quantifying fragmentation specifically, our models thus can improve threat assessment. For example, because of a lack of information regarding fragmentation, a recent attempt to use satellite imagery to consistently assess extinction risk of >11,000 forest-dependent species necessarily assumed that all species with small ranges and declining habitat were also subject to significant levels of fragmentation (9). Our fragmentation models can fill this important methodological gap, allowing more accurate satellite-derived classification of fragmentation and hence extinction risk using Red List criterion B.

Our models also reveal evidence for increased fragmentation in species not classified as threatened by the IUCN (Fig. 1). Specifically, species classified as Near Threatened (phylogenetic generalized linear model β ± SE = −0.46 ± 0.08, z = −5.86, P < 0.001) and Data Deficient (β ± SE = −0.76 ± 0.07, z = −10.67, P < 0.001) had more fragmented habitat than Least Concern species (Fig. 1). Although Data Deficient species have inadequate information to formally assess extinction risk (13), they tend to have smaller body and range sizes (17, 28), are nocturnal and thus difficult to study (28), and many are likely to be threatened (17). It is possible that the range size of many of these poorly known species is underestimated, and the degree of ecological specialization overestimated, because of limited available information (17), potentially inflating our measure of habitat fragmentation. More information regarding the distribution, life history, and ecology of Data Deficient species, including their habitat affinities and responses to human disturbances, will help refine our models. Nonetheless, available evidence suggests that both Near Threatened and Data Deficient species have increased fragmentation within their known ranges, indicating that the threat of fragmentation exists at the earliest and least-understood stages of endangerment. Our models quantifying fragmentation allow us to better identify such emerging threats.

Summing the fragmentation metric across all species reveals global patterns of core habitat and fragmentation for the world’s terrestrial mammals (Fig. 3A). Primary areas of intact high-quality core habitat include northern Africa and much of the Amazon Basin in South America, and portions of western and central North America, sub-Saharan Africa, Australia, and northern, southwestern, and southeastern Asia. Of these areas, the Amazon Basin supports the greatest richness of terrestrial mammals, followed by sub-Saharan Africa and portions of western and central North America and southeastern Asia (Fig. S2A). Standardizing the fragmentation models by species richness more strongly highlights species-poor locales (most notably desert regions of northern Africa and southwestern Asia) with extensive core habitat for the relatively few species that occur there (Fig. S2B). Terrestrial mammalian diversity, however, is sufficiently low in these regions that they are de-emphasized as core habitat in our global fragmentation models (Fig. 3).

Degree of habitat fragmentation for the world’s terrestrial mammals. (A) Degree of habitat fragmentation as indexed by the fragmentation metric, measuring the amount of core (i.e., interior) habitat, and (B) degree of anthropogenic habitat fragmentation, calculated by weighting data in A by a recently developed global HM model (Fig. S3). The resulting map identifies regions that have been fragmented by human development specifically, and de-emphasizes regions that are naturally fragmented such as high-elevation areas and landscapes with water bodies interspersed. The color gradient in the legends are the original (A) and weighted (B) fragmentation values binned into deciles. Blue denotes regions with low fragmentation, where mammal species occur in large patches of intact high-suitability core habitat. Red denotes regions with high fragmentation, where mammal species have little core habitat. Fragmentation metrics are spatially quantified by summing the metric at each 300 × 300-m cell for all terrestrial mammal species worldwide.

Species richness and degree of habitat fragmentation (corrected for species richness) for the world’s terrestrial mammals. (A) Species richness based on extent of suitable habitat. Blue denotes sites with few mammal species, and red denotes sites with the highest species richness. (B) Degree of habitat fragmentation, corrected for species richness by dividing the fragmentation metric within each cell globally by the number of species with suitable habitat within that cell, thus generating an average fragmentation index (Methods). Blue denotes sites with low fragmentation, where terrestrial mammals, averaged across species with suitable habitat at a site, have the most intact high-suitability core habitat. Red denotes sites with high fragmentation, where mammal species on average have little core habitat.

Conversely, fragmentation hotspots are regions with relatively low interior distances within high-suitability habitat, summed across all species present in an area (Fig. 3A). Such regions include much of South America outside the Amazon Basin, as well as portions of south-central Asia, eastern North America, and Europe. Interestingly, our models identify notable fragmentation for high-latitude (e.g., arctic) and high-elevation (e.g., Himalayan) species (Fig. 3A). For the arctic, the models are primarily identifying natural fragmentation of suitable habitat because of ice, water bodies, coastlines, and islands at the edge of species ranges. Similarly, for high-elevations, the models are identifying patterns of natural fragmentation above the altitudinal limits of species. Weighting the global fragmentation map with a recently developed high-resolution, global human modification layer (29) highlights regions that have been fragmented by human development specifically and de-emphasizes regions with natural fragmentation, such as high-latitude and high-elevation areas (Fig. 3B and Fig. S3). We emphasize, however, that arctic and montane species, including high-altitude endemics, are particularly vulnerable to climate change (30, 31) and thus still impacted by natural fragmentation that might prevent distributional shifts in response to altered climate regimes.

Weighting (A) the fragmentation model with (B) a recently constructed HM model highlights (C) anthropogenic fragmentation for terrestrial mammals. (A) Degree of habitat fragmentation as indexed by the fragmentation metric, measuring the amount of core (i.e., interior) habitat. (B) The HM model combines the effects of multiple stressors (e.g., urban and agricultural land cover, energy production, nighttime lights, and roads) into an overall score of HM. (C) Weighting fragmentation (A) by HM (B) identifies regions that have been fragmented by human development specifically, and de-emphasizes regions that are naturally fragmented such as high-elevation areas and landscapes with water bodies interspersed. Blue denotes regions with low fragmentation (A and C) or HM (B), and red denotes regions with high fragmentation (A and C) or HM (B). Figure focused on Himalayan area in south-central Asia, given that this region represents the utility of our reweighting approach (see main text).

On average, across the 4,018 species of terrestrial mammals included in our analyses, only 48.6% (range: <0.001–100% SE = 0.004) of the current geographic range of a species was comprised of high-suitability habitat (10). Moreover, only 3.6% (range: 0–100% SE = 0.001) of the average range was comprised of high-suitability habitat located within known protected areas, well below the approximately 15% of terrestrial areas that are currently protected globally (32), further emphasizing the inadequacy of the global network of protected areas (33). Of additional concern is that habitat degradation is especially prevalent in many of the global hotspots of mammal habitat identified in our models, particularly tropical regions in the Americas, Africa, and Asia that experience high deforestation (2). For example, much of the tropical forest in the Amazon Basin, a critical global hotspot of core mammalian habitat, had experienced rapid deforestation from human development (34, 35), although it appears that such habitat destruction has slowed recently as a result of policy-driven government action (36).

Our models can inform the management and conservation of mammals globally. First, unlike most comparative extinction-risk analyses, we focus on an urgent yet manageable anthropogenic threat (i.e., habitat fragmentation) rather than solely on intrinsic biological traits (e.g., body mass), which addresses ongoing concerns about the utility of comparative analyses for applied conservation (15, 37). Second, our habitat models narrow the focus of mammal distribution to include only regions of high-suitability habitat this is critical, because species are not homogeneously distributed throughout their ranges (18) and less than half of the range of terrestrial mammals is on average comprised of high-suitability habitat (10). Third, our fragmentation models not only evaluate global patterns of species richness based on suitable habitat (10), they also quantify the degree to which suitable habitat exists within core habitat patches. This is essential, given that hotspots of species richness and extinction threat may not overlap (38, 39), and our analyses demonstrate that the degree of fragmentation of patches influences extinction risk. Finally, the 300-m resolution of the global-habitat models facilitates more detailed analyses of fragmentation patterns at the local scale, which approaches the scale of conservation action (40). For example, our models can be used to identify the degree to which reserve networks designed for umbrella species, such as jaguars, maintain high-quality core habitat for sympatric mammals (41). Such real-world application of our fragmentation models demonstrates their utility for conservation practitioners, particularly in comparison with simple boundaries of the geographic range, which provide no information about the expected occurrence of species within their broad distributional extents.

Additional efforts to apply these models to local scales, and validating them with empirical data on fine-scale distribution and habitat use, such as that derived from GPS telemetry or remote camera surveys, will help to more thoroughly assess their utility for real-world conservation application. In addition, exploration of alternative fragmentation and connectivity metrics, including metrics that assess patch isolation and configuration (5, 12), would also yield further insight into how habitat fragmentation and landscape connectivity are related to extinction risk. More complex patch and landscape metrics might be particularly valuable at finer scales or for smaller subsets of species. Development of a comprehensive database estimating dispersal distances for mammal species, and incorporation of such data to assess how variability in species-specific dispersal ability influences scaling of patch sizes and responses to fragmentation effects, would represent another important advancement. Finally, given that anthropogenic fragmentation increases contact and potential conflict between humans and wildlife, human tolerance of and behavior toward wildlife are fundamental determinants of their ability to persist within fragmented landscapes consequently, social science research will be critical to mitigate fragmentation effects in human-dominated systems (42 ⇓ –44).

Ultimately, habitat fragmentation has severe effects on the composition, structure, and function of ecosystems (3, 5, 8), and our results demonstrate that fragmentation degrades suitable habitat and increases the extinction risk of mammals globally. Such impacts warrant intensified efforts to protect remnant habitat and restore broad-scale landscape connectivity to ameliorate the effects of fragmentation (5, 12). Quantification of fragmentation will help prioritize such global conservation efforts and develop more effective strategies for conserving the world’s mammals.


Although the study was limited to a small part of parameter space (i.e., trees with approximately 1000 leaves and mainly simulated datasets), the study reveals several trends regarding the relative accuracy of alignment and tree estimation methods given datasets that contain a mixture of full-length and fragmentary datasets. These trends also are helpful in understanding the design issues for phylogenetic placement methods, and in choosing between methods. We discuss these trends here, and compare our findings to prior work.

Importance of MSA Method

A main finding of this study is that when datasets have fragmentary sequences, the best is obtained using an MSA-ML protocol however, not all MSA methods provide good accuracy. We examined two strategies for computing alignments: estimating the entire alignment in one stage with PASTA or using a two-stage approach where we use PASTA only to align the full-length sequences and then added the remaining fragmentary sequences into the backbone alignment using either UPP or SEPP. In this study, using the two-stage approach always matched or improved on the alignment accuracy (both SPFN and SPFP) compared to just using PASTA.

We also saw a slight advantage using UPP rather than SEPP to align the fragmentary sequences. Interestingly, we did not see any noteworthy differences between using FastTree or RAxML to compute the backbone tree, whether using UPP or SEPP. Overall, therefore, these results show that alignment estimation using a two-stage approach produces superior results over PASTA by itself, that the method used to compute the backbone tree on the full-length sequences does not have a significant impact, and that UPP has a slight advantage over SEPP.

These results are consistent with those shown in the paper introducing UPP (see Table 3 and Fig. 3 in Nguyen et al. (2015)), which showed that both alignment and tree error increased more rapidly for PASTA than for UPP as the degree of fragmentation increased. A comparison between SEPP(F) and UPP(F) is also provided in Nguyen et al. (2015) (see Additional File 1, Table S2.1), which also showed that UPP(F) had a small advantage over SEPP(F). Hence our study confirms prior results from Nguyen et al. (2015), and extends these observations to include the impact of how the backbone tree is calculated. Henceforth, when we refer to UPP, we mean either UPP(R) or UPP(F), since the two ways of computing alignments had indistinguishable accuracy.

RAxML vs. FastTree

One of the aspects of the study we performed is a comparison of FastTree and RAxML given alignments that contain fragmentary sequences. To the best of our knowledge, Sayyari et al. (2017) is the only other study that has evaluated RAxML and FastTree under simulation conditions where fragmentation was explicitly included. Sayyari et al. (2017) compared FastTree and RAxML on true alignments with 101 sequences that had fragmentary sequences, each obtained from a single model condition. Because their study was limited to one model condition and only explored true alignments, our study explores additional conditions that vary substantially in rate of evolution and sequence evolution model, in order to better evaluate the differences between these methods given alignments containing fragmentary sequences.

One of the consistent trends in this study is that for many model conditions with fragmentary sequences, FastTree produces less accurate trees than RAxML. This trend is less obvious when used with the PASTA alignment on the fragmentary datasets (which produces generally poorer alignments than the other alignments we tested, resulting in poor trees regardless of the tree estimation method used), but is very obvious when used with the better alignment methods we explored, especially under high fragmentation conditions. In particular, the degree of fragmentation and the rate of evolution impact the difference in FN rate between trees computed using FastTree or RAxML on the UPP alignment, with small differences (or no difference) when fragmentation and alignment error are both low, but increasing differences as fragmentation and alignment error increase. Thus, our study confirms the observation made by Sayyari et al. (2017) that FastTree is less accurate than RAxML given alignments containing fragmentary sequences.

In this context, it is worth recalling Janssen et al. (2018), which compared phylogenetic placement using SEPP-pplacer to their “de novo" method that used MAFFT to compute alignments and then computed trees using FastTree although they found that SEPP-pplacer was more accurate than their de novo method, this is likely at least partly due to the use of MAFFT instead of UPP (or even PASTA), and the use of FastTree instead of RAxML, and is consistent with our findings.

One possible explanation for the difference in accuracy between FastTree and RAxML in the presence of fragmentary datasets is that they numerically treat gaps differently. Thus, although treating gaps are “missing data" theoretically should not change the guarantee of statistical consistency ( Truszkowski and Goldman, 2016), it has the potential to impact accuracy on a given dataset, and the impact of gaps within sequence alignments on phylogeny estimation is a topic of significant and continued interest in the systematics community (see Lemmon et al. (2009) Wiens (2006) Truszkowski and Goldman (2016) Simmons (2014) Dobrin et al. (2018) Machado et al. (2019) Xia (2019) for an entry to this literature).

Comparing Phylogenetic Placement Methods

We explored pplacer with two different techniques to compute extended alignments (i.e., UPP and SEPP) and possibly constraining the placement to the alignment subset selected by the ensemble of profile Hidden Markov Models technique when used with SEPP. These results show that using pplacer with UPP improves accuracy compared to using pplacer with SEPP, and that the unconstrained use of pplacer is more accurate than the constrained version. The improvement we observed for the unconstrained version over the constrained version of pplacer, which only allows it to place fragments into the subtree of the backbone tree selected by SEPP during the alignment stage, is consistent with results shown in Figure 1 from Mirarab et al. (2012).

Our evaluation of APPLES was limited to its use with UPP, which had the best accuracy of all alignment methods. However, our study shows that pplacer was always more accurate than APPLES, given the same backbone tree and UPP alignment. The improvement of pplacer over APPLES was higher for the high fragmentation conditions than the low fragmentation conditions, and higher for the datasets that were difficult to align than the datasets where alignment error was generally low. However, even for the model conditions with low fragmentation, the differences could be large (e.g., the difference in accuracy on the two biological datasets with low fragmentation was in the 7-9% range). We conclude that pplacer is at least as accurate as APPLES for placing fragmentary sequences into backbone trees when the backbone trees are not too large (i.e., have at most 1000 leaves).

The only prior study that compared APPLES to pplacer is Balaban et al. (2020), which explored APPLES and pplacer for placing full-length sequences into backbones and used the true alignment rather than estimated alignments. One major finding in Balaban et al. (2020) is the propensity of pplacer to fail when the backbone tree was too large: in particular, they found that pplacer failed on many datasets where the backbone tree had 5000 leaves and always failed on backbone trees with 10,000 leaves. For this reason, our study did not compare APPLES and pplacer on such large backbone trees. When restricted to conditions where the backbone trees had at most 1000 leaves, Balaban et al. (2020) found that pplacer had better accuracy than APPLES, though they used a different criterion to evaluate accuracy than we did (specifically, they used “placement accuracy", which is the distance between the estimated placement for the fragment and the true placement, while we used the error in the final tree). Balaban et al. (2020) observed that pplacer was approximately 10% more accurate than APPLES for placement accuracy on 1000-taxon RNASim subsamples (see Table 3 and Fig. 3 in Balaban et al. (2020)), while we have a difference in FN rate of 14% and 20% on RNASim under low and high fragmentation, respectively. The relative performance observed between APPLES and pplacer is thus the same between the two studies (i.e., APPLES is less accurate than pplacer), but the criteria are different and the details of the study (fragmentary versus full-length sequences, true versus estimated alignments) are also different. Finally, although we restricted our study to datasets with backbone trees limited to at most 1000 sequences, we explored a wider range of model conditions than explored in Balaban et al. (2020), including both easier and harder model conditions than RNASim (which is the only source of datasets examined in Balaban et al. (2020)).

Impact of Dataset Properties on Performance

Because we observed that tree error was largely driven by alignment error (for both types of tree estimation methods, whether based on maximum likelihood on estimated alignments or using phylogenetic placement), the model conditions can be characterized as easy or difficult based on the alignment error rates we observed. With this context, the easiest model condition we explored was 1000M4, which is a simulated dataset generated under a modification of the GTRGAMMA model to allow for insertions and deletions, but with overall low rates of substitutions and indels. The other ROSE simulation conditions have higher rates of evolution than 1000M4, with 1000M1 having the highest rate (and being the hardest dataset in our collection). In terms of alignment error, RNASim and RNASim2 both fall in the middle of the ROSE conditions, despite each having a lower average p-distance than even 1000M4. Alignment error on the biological datasets 16S.M and 23S.M are high, placing them between 1000M1 and 1000M2 in terms of difficulty, even though they have even lower average p-distances than RNASim.

Since the evolutionary process operating on the ROSE datasets is much simpler than the evolutionary process used to generate the RNASim data, and of course the evolutionary processes under which the biological datasets evolved are also more complex than the ROSE simulation, an obvious explanation is that alignment error is higher on the RNASim and biological datasets because their sequence evolution is more complex than is modelled by ROSE. However, another possibility is that there is some other empirical property of the model condition that is making for alignment challenges. For example, it may be that the existence of very long branches in the tree may make alignment estimation difficult, which would be consistent with the observation that the biological datasets have low average p-distances but high maximum p-distances, and are difficult to align.

Model conditions that produce higher differences in alignment error also seem to produce larger differences in tree estimation error, but there were conditions with relatively small differences in alignment error that resulted in large differences in tree error. For example, the largest difference in alignment error for the low fragmentation conditions was on the 1000M1 condition (which had the highest alignment error rates), where the PASTA and UPP alignments differed in SPFN error by 4% and yet the RAxML trees on the PASTA and UPP alignments differed in FN error by 10%. Thus, while the relative accuracy of trees followed the relative accuracy of the alignments on which they were based, the degree of improvement depended on the actual condition, with larger differences in trees for conditions with high alignment error.

Conservation Genetics


Conservation genetics is the application of genetics to reduce the risk of population and species extinctions . It deals with genetic factors causing rarity, endangerment, and extinction (inbreeding and loss of genetic diversity), genetic management to minimize these impacts, and the use of genetic markers to resolve taxonomic uncertainties in threatened species, to understand their biology, and to detect illegal hunting or trade in threatened species. It is an applied discipline that draws on evolutionary and molecular genetics.

The need to conserve species arises because the biological diversity of the planet is rapidly being depleted as a direct or indirect consequence of human actions. An unknown but large number of species are already extinct, while many others have reduced population sizes that put them at risk. Many species now require human intervention to optimize their management and ensure their survival. The scale of the problem is enormous 56% of mammals, 58% of birds, 62% of reptiles, 64% of amphibians, and 56% of fish are categorized as threatened by the International Union for Conservation of Nature.

Four justifications for maintaining biodiversity have been advanced: the economic value of bioresources, ecosystem services, aesthetics, and the right of living organisms to exist. The IUCN recognizes the need to conserve biodiversity at three levels: genetic diversity, species diversity, and ecosystem diversity. Genetics is directly involved in the first two of these.

Estimating terrestrial biodiversity through extrapolation

Both the magnitude and the urgency of the task of assessing global biodiversity require that we make the most of what we know through the use of estimation and extrapolation. Likewise, future biodiversity inventories need to be designed around the use of effective sampling and estimation procedures, especially for 'hyperdiverse' groups of terrestrial organisms, such as arthropods, nematodes, fungi, and microorganisms. The challenge of estimating patterns of species richness from samples can be separated into (i) the problem of estimating local species richness, and (ii) the problem of estimating the distinctness, or complementarity, of species assemblages. These concepts apply on a wide range of spatial, temporal, and functional scales. Local richness can be estimated by extrapolating species accumulation curves, fitting parametric distributions of relative abundance, or using non-parametric techniques based on the distribution of individuals among species or of species among samples. We present several of these methods and examine their effectiveness for an example data set. We present a simple measure of complementarity, with some biogeographic examples, and outline the difficult problem of estimating complementarity from samples. Finally, we discuss the importance of using 'reference' sites (or sub-sites) to assess the true richness and composition of species assemblages, to measure ecologically significant ratios between unrelated taxa, to measure taxon/sub-taxon (hierarchical) ratios, and to 'calibrate' standardized sampling methods. This information can then be applied to the rapid, approximate assessment of species richness and faunal or floral composition at 'comparative' sites.