Background

The Berberidaceae (Ranunculales) is an early-diverging eudicot plant family comprising 19 genera, including the newly proposed Alloberberis P.H. Raven ex C.C. Yu & K.F. Chung and Moranothamnus P.H. Raven ex C.C. Yu & K.F. Chung [1, 2]. The 680 + Berberidaceae species are predominantly distributed in northern temperate zones extending to Andean South America and northern Africa [3,4,5]. The barberry family is traditionally known for its morphological diversity, intercontinental discontinuous distribution and medicinal utilization [5, 6].

Mahonia Nutt. is the second largest genus in Berberidaceae, comprising about 100 species [7]. However, the precise number of Mahonia species remains ambiguous, as 33 species were synonymized in the Flora of China [4]. Morphologically, the species of Mahonia are easily distinguished from other angiosperm species by their evergreen odd-pinnately compound leaves, their leaflets margins with spinose dentation, and their spineless stems [4]. The species of Mahonia are distributed in East Asia and Western North America [1, 8], making the genus an emblematic example of a biogeographic disjunction. Besides, a few species of Mahonia are endemic to Europe, North Africa and South America [8]. Many species of Mahonia are broadly cultivated for horticulture (Fig. 1) and for their pharmacological properties [4, 7, 9]. For instance, the stems of M. bealei (Fortune) Carrière and M. fortunei (Lindl.) Fedde are known as Caulis Mahoniae with highly anti-inflammatory properties [10] and are included in the Chinese pharmacopoeia [9].

Fig. 1
figure 1

Morphological diversity of Mahonia species. A B M. bealei. A fruits. B racemose inflorescence. D M. hancockiana. C E H M. fortunei. C a single flower. E compound leaves. H racemose inflorescence. F M. fordii. G M. eurybracteata subsp. ganpinensis. I M. napaulensis. J M. bordinieri. K M. shenii. L M. breviracema. M M. oiwakensis. N M. duclouxiana

The position of the genus Mahonia remains intractable and has been discussed for a long time [1, 11, 12]. Traditionally, morphological and molecular evidence indicated that Mahonia was paraphyletic, with Mahonia sect. Horridae being sister to Berberis L. sensu stricto [5, 12, 13]. Although several authors held a view that Mahonia should be subsumed under a broadly defined Berberis (Berberis sensu lato) [12,13,14], a great majority of researchers advocate for a paraphyletic Mahonia because of its compound leaves that are distinct from the simple leaves of Berberis [7, 11, 12, 15]. Yu and Chung [1] proposed a new classification that divided Berberis s.l. into four monophyletic clades and establishing four new genera (BerberisBerberis s.s., Mahonia ≡ core Mahonia, AlloberberisMahonia sect. Horridae, and MoranothamnusBerberis claireae). This taxonomic treatment not only maintained the universally acceptable perception of Berberis but also resolved Mahonia as a monophyletic genus, which has been widely accepted by botanists in the fields of phylogenomics [16], taxonomy [17], and biogeography [8].

The genus Mahonia is taxonomically and phylogenetically challenging, owing to its considerable species richness, to the rapid diversification events that have punctuated its evolutionary history, and to the high similarity in the morphology of reproductive structures that hinders the easy and accurate identification of species [7, 8, 18]. Within Mahonia, floral organs are usually invariable in number and are arranged in whorls, and all Mahonia species bear similar yellow flowers and blue-black globose berries [4, 7].

Previously, a series of comparative plastome analyses found that a large IR expansion of over 12 kb occurred in M. bealei, which is unusual in plastome evolution [19, 20]. Using ITS (nuclear ribosomal DNA) and four DNA fragments (including the genes accD, ndhF, rbcL and the intergenic spacer trnH-psbA) of the plastid genome, Yu and Chung [1] proposed a new classification that recognized Mahonia as a distinctive monophyletic genus with strong support. The oldest reliable fossil records of Mahonia were collected in East Asia and used for biogeographic analyses [18]. On the basis of molecular dating estimates and comparison of leaf morphologies of extant Mahonia species, the researchers inferred that the genus Mahonia originated in Western North America and subsequently dispersed into East Asia. Notably, after the migration to East Asia, the genus Mahonia probably underwent a radiation event, leading to the current Eastern Asian biodiversity center [8, 18].

Phylogenetic analyses based on molecular datasets provide reasonable phylogenetic hypotheses [21,22,23,24]. However, it is disputable that using just a single line of evidence is sufficient to delimit species boundaries [25]. Therefore, multiple evidence (such as including plastome datasets, morphological traits, ecological traits) should be applied to modern systematics [26], in particular with respect to recently diverging lineages, such as Mahonia. Complete plastome data have proven to be effective in resolving phylogenetic relationships at a wide range of taxonomic levels [16]. The phylogenetic incongruence between the plastome tree and the nrDNA tree indicated that frequent hybridization has occurred between Mahonia and Berberis [2].

Characters of the apex of petals, the length of pedicels and bracts, the number of leaflets and spinose dentations, were used as critical morphological traits for discriminating the species of Mahonia [3, 4]. Given the stability and uniformity of micromorphological traits among taxa, researchers have undertaken a series of investigations to provide more evidence for the classification of Mahonia (e.g., floral anatomy [27]; seed micromorphology [28]; carpel micromorphology [29, 30]; sepal morphology [31]). Structural characters of leaf epidermis are usually constant and more accessible; they have been proven to possess great systematic significance in some complex taxa [32,33,34]. However, studies of leaf epidermal micromorphology with respect to the genus Mahonia is far from sufficient.

Here, we sequenced 17 representative complete plastomes of the genus Mahonia, and used 13 plastomes from GenBank to conduct comparative and phylogenetic analyses. Morphological and micromorphological traits of different species of Mahonia were recorded. We combined the evidence from the molecular and morphological data to resolve the phylogenetic relationships in Mahonia. Our goals are to 1) reconstruct phylogenetic relationships within Mahonia using nuclear internal transcribed spacer (ITS) and plastid genome sequences; 2) describe and interpret the plastome structure and evolution of Mahonia; 3) explore an integrative method for better distinguishing among Mahonia species.

Results

Plastome features of Mahonia

The number of raw paired-end reads for each plastome ranges from 15,790,898 (M. breviracema Y.S. Wang & P.G. Xiao RC611 [MZ158268]) to 25,347,900 (M. duclouxiana Gagnep. RC602 [MZ086770]) (Table 1). The assembled plastid genomes range from 165,216 bp (M. napaulensis DC. RC603 [MZ158275]) to 165,928 bp (M. shenii Chun RC609 [MZ158280]) in length with 38% to 38.1% genomic GC contents overall. The GC contents in inverted regions (IR, 41.1%–41.2%) are much higher than in the large single copy (LSC) and in the small single copy (SSC) regions (Table 1). The typical quadripartite configuration of these plastid genomes consisted of IR (36,641 bp–36,864 bp), which are separated by LSC (73,198 bp–73,703 bp) and SSC (18,563 bp–18,873 bp) regions (Fig. 2, Table 1).

Table 1 Summary of 17 complete chloroplast genomes of Mahonia
Fig. 2
figure 2

Gene map of Mahonia chloroplast genome. The two gray arrows indicate the direction of gene transcription. The dashed area in the inner circle indicates the GC content of the plastome. LSC: large-single-copy; SSC: small-single-copy; IR: inverted repeat

The Mahonia species we sequenced encode 113 unique genes, 34 of which are duplicated in the IR. A total of 79 protein-coding, 30 transfer RNA (tRNA) and four ribosomal RNA (rRNA) genes are successfully predicted. Each IR copy contains 23 protein-coding, seven tRNA and four rRNA genes. In total, 147 genes are included in the Mahonia plastid genomes we reconstructed (Table S1). There are 18 unique intron-containing genes in the plastid genomes. Sixteen genes (six tRNA and ten protein-coding genes) have a single intron, and the other two (ycf3 and clpP) possess two introns.

Comparative plastid genome analyses

Using an annotated plastid genome (Mahonia bealei RC601 [MZ158266]) as reference, we plotted two graphs for the overall sequence identity of ten Mahonia species and their outgroups using the program mVISTA (Figs. 3, S1). The results reveal that there are only slight variations within Mahonia plastid genomes. These variations are usually observed in the intergenic spacers (IGS) instead of coding-regions, which implies that coding regions are more conserved than non-coding regions (Fig. 3). The whole plastid genome of M. bealei RC601 [MZ158266] is also compared with those of Berberis aristata DC., Ranzania japonica (T. Itô ex Maxim.) T. Itô, Gymnospermium kiangnanense (P.L. Chiu) Loconte, Leontice armeniaca Boivin, Caulophyllum robustum Maxim. and Nandina domestica Thunb. However, the results reveal that there is a significant divergence in terms of sequence length, gene order and content among the genera related to Mahonia (Fig. S1). A large-scale IR expansion was found only in the plastomes of Berberideae, resulting in the additional duplication of 15 genes compared with typical angiosperm plastomes (Figs. S1, S2). The plastome size of M. bealei is about 165 kb and harbored more genes than other genera. Due to incomplete duplication of the normal copy, the gene ycf1 across the IRb-SSC boundary is truncated to ca. 1346 bp and recognized as a pseudogene (ψycf1). Out of the three exons of the trans-splicing gene rps12, two are duplicated in the IR. Gene rearrangement is not observed within Mahonia plastid genomes (Fig. S3).

Fig. 3
figure 3

Visualization of alignment of ten plastomes of the species of Mahonia chloroplast genomes using mVISTA. M. bealei RC601 was used as a reference sequence. Blue represents coding regions, pink represents non-coding regions and gray arrows point at genes

We compared the pairwise sequence distances and the number of nucleotide substitutions of 19 Mahonia species from 20 individuals. The highest level of pairwise sequence distance rate and the number of nucleotide substitutions (0.00546, 900 bp) is detected in the pair of M. pinnata and M. japonica. The lowest level (0, 0 bp) is observed between M. bealei RC601 [MZ158266] and M. bealei [MH795308] (Table S2).

Contraction and expansion of IR regions

We compared the IR-SC boundaries among seven plastid genomes from different genera of Berberidaceae, and showed that the contraction and expansion of IR varied among different genera of Berberidaceae. Visualizing these whole plastomes, we observed a large expansion of IR in Mahonia bealei, as well as in Berberis aristata. As a result, about 12 kb corresponding to 15 genes (including rps19, rpl22, rps3, rpl16, rpl14, rps8, infA, rpl36, rps11, petD, petB, psbH, psbN, psbT and psbB) had suffered an additional duplication compared with the rest of the species we studied (Figs. S1, S2, Table S1). Thus, the IRb-LSC boundaries in these three species are located upstream of the psbB gene rather than located within the rps19 gene, which is observed in the other four species we studied (Caulophyllum robustum, Gymnospermium kiangnanense, Leontice armeniaca, and Nandina domestica). The IRa-SSC and IRb-SSC boundaries are located within ycf1 and ycf1 pseudogenes (ψycf1), respectively. The ndhF genes, located downstream of the ycf1 pseudogene, are 37 bp–540 bp away from the IRb-SSC boundaries. There are 7–77 bp from trnH genes to IRa-LSC boundaries (Fig. S2). In contrast, only slight shifts are observed in interspecies comparisons among ten Mahonia plastid genomes (Fig. 4).

Fig. 4
figure 4

Comparison of the LSC, IR and SSC boundary regions of ten plastomes of the species of Mahonia

Identification of hypervariable regions

Genome-wide sliding window analysis among 20 Mahonia individuals was performed in order to calculate the nucleotide diversity (Pi) values and identify the highly variable regions (mutational hotspots). The Pi values across the whole plastid genomes range from 0 to 0.06285 (mean = 0.00205), and the accD region exhibits the highest diversity level (Fig. 5). The eight most hypervariable regions (Pi > 0.008) were identified: five (petNpsbM, ndhCtrnV, atpBrbcL, accD, rpl20clpP) are located in the LSC, and the other three (ycf1, ccsA–ndhD, ψycf1) in the SSC. None was found in the IR (Fig. 5). The Pi values of the eight hypervariable regions we extracted range from 0.00311 to 0.0974 (Table 2).

Fig. 5
figure 5

Sliding window analysis of the 20 plastomes of samples of Mahonia

Table 2 Sequence characteristics of eight highly variable regions among 20 plastomes

Repeated sequence analyses

We used MISA to detect the simple sequence repeats (SSRs) among ten species of Mahonia. The number of SSRs in each Mahonia plastid genome varies from 81 in M. aquifolium to 94 in M. cardiophylla T.S. Ying & Boufford RC604 [MZ158269] and M. shenii. Within these SSRs, mononucleotides are the most abundant (86.1%), followed by hexanucleotides and then by dinucleotides and trinucleotides. In addition, tetranucleotides and pentanucleotides appear rarely in plastid genomes (Fig. 6A). The lengths of all the SSRs range from 10 to 28 bp, and a majority of the SSRs units possess 10 base pairs (Fig. 6B). Most of the SSRs are distributed in LSC regions, and the SSRs located in SSC regions and IR are nearly equal in size (Fig. 6C).

Fig. 6
figure 6

Analyses of SSRs and repeated sequences in plastomes of ten species of Mahonia. A Frequency of microsatellites by the length of repeated units. B Frequency of microsatellites by length. C Frequency of all repeats by location. D Numbers of five different types of repeats. E Frequency of four types of dispersed repeats by length

Overall, a total of 208 tandem repeats were identified within the ten Mahonia plastid genomes. Each plastome contains 18 to 24 tandem repeats (Fig. 6D). We recorded 359 dispersed repeats in the ten plastid genomes of this genus. Each plastome includes 30 to 42 dispersed repeats. The forward repeats account for the largest proportion of dispersed repeats (59.6%), followed by palindromic repeats and then by reverse repeats. Moreover, complement repeats are often absent, except for M. japonica (Thunb.) DC. RC615 [MZ158274] and M. hancockiana Takeda RC613 [MZ158273] (Fig. 6D). The most common types of dispersed repeats range from 30 to 39 bp in length (Fig. 6E).

Phylogenetic analyses

In this study, nine alignment matrices were used to perform phylogenetic analyses using Bayesian inference (BI) and Maximum Likelihood (ML) method. The matrices consist of 20 ingroup accessions (Mahonia) and ten outgroup accessions. Notably, the ITS gene matrix includes only 28 samples, as the ITS sequences of Leontice armeniaca and Ranzania japonica are not available. The genus Mahonia is resolved as monophyletic and is sister to Berberis with strong support (bootstrap support (BS) ≥ 99%, posterior probabilities (PP) ≥ 0.99) across almost all trees (Figs. 7, S4, S5, S6). In the tree built using the complete plastid genome datasets, about 75 percent of the nodes are well supported (BS/PP = 99%/0.99). The phylogenetic trees exhibit that Mahonia species are grouped into four subclades. Subclade I comprises two species (M. pinnata, and M. aquifolium), which are both distributed in Western North America, while the species from the remaining subclades are native to East Asia. Subclade II contains a single species (M. polyodonta Fedde RC607 [MZ158279]). Subclade III consists of five species (M. nitens C.K. Schneid. RC605 [MZ158276], M. fortunei (Lindl.) Fedde [NC_042167], M. japonica (Thunb.) DC. RC615 [MZ158274], M. fordii C.K. Schneid. RC612 [MZ158271], and M. bodinieri Gagnep. RC608 [MZ158267]) with maximum PP support value (1.00). Regarding subclade III, the BI tree topology is not concordant with the ML tree topology. The remaining twelve individuals are gathered into subclade IV with high support values (BS/PP = 98%/1.00). Subclade I is the earliest-diverging lineage of Mahonia. Subclade IV is sister to subclade III, and together form a clade that is sister to subclade II (Fig. 7A).

Fig. 7
figure 7

Phylogenetic relationships of Mahonia inferred from Bayesian inference (BI) and maximum likelihood (ML) based on four datasets. A complete plastomes. B ITS sequences. C hypervariable regions. D rbcL + matK + trnH-psbA. The support values above the branches show PP (posterior probability)/BS (bootstrap support), and asterisks indicate 1.00/100%. Dashes represent incongruences of BI and ML trees

To test the conflicting signals between plastomes and ITS sequences, both BI tree and ML tree based on ITS datasets were generated and together compared with the trees based on plastomes (Fig. 7B). As shown in Fig. 7B, the genus Mahonia is also recovered as a clade with moderate support (BS/PP = 0.96/85). Subclades I and II in the plastome tree are completely congruent with subclades A and B in the ITS tree. The tree topologies outside subclades I and II are incongruent. Indeed, the tree based on ITS sequences possessed the highest number of polytomies and could not provide any valuable information to resolve the infrageneric relationships. However, given that support values at internal nodes are much higher than the external, we can properly cluster the several subclades into subclades III and IV (in the plastome tree). Subclade III is largely identical to subclade G. Subclade IV gathers the remaining clades (C, D, F and H). Notably, the positions of Mahonia fortunei Chung 3342 [KX549421], M. gracilipes (Oliv.) Fedde RC606 [MZ158272] + M. nitens (subclade E) in the ITS tree are severely in conflict with the plastome tree.

It is noteworthy that the tree topology based on eight concatenated hypervariable regions is mainly identical to the whole plastid genome tree. Whereas the phylogenetic relationships within subclade III and IV based on whole plastid genomes (Fig. 7A) are not fully consistent with the topology from the hotspots (Fig. 7C). The trees based on concatenated rbcL, matK, and trnH-psbA, have the lowest phylogenetic resolution (Fig. 7D). Furthermore, none of the phylogenetic reconstructions based on the concatenated rbcL, matK, and trnH-psbA datasets provides any evidence for the monophyly of Mahonia (Fig. 7D).

Based on the five datasets (coding, non-coding, LSC, SSC, and IR regions) extracted from the plastid genomes, the overall topology is consistent with the topology retrieved from the complete plastid genome datasets. However, support values are high mostly at deep nodes (Fig. S4).

Leaf morphological and micromorphological characteristics

The leaves of Mahonia are odd-pinnately compound. The adaxial surfaces of mature leaves are glossy for most species (Fig. 8). The leaflets show substantial diversity with respect to the number and shape among different species. Margins of each leaflet are variously toothed with coarse or fine spined serrations (Figs. 9A1–F1, S7A1–F1, S8A1–F1).

Fig. 8
figure 8

Morphological variations of compound leaves in Mahonia. A M. fordii. B M. napaulensis. C M. hancockiana. D M. duclouxiana. E M. eurybracteata. F M. bordinieri. G M. oiwakensis. H M. cardiophylla. I M. eurybracteata subsp. ganpinensis. J M. fortunei. K M. nitens. L M. gracilipes. M M. shenii. N M. breviracema. O M. polyodonta

The adaxial surfaces of epidermal cells are almost convex (Figs. 9A2–D2, S7A2, B2, D2), slightly convex (Figs. 9E2, F2, S7C2, E2, S8A2, B2). Fewer upper surfaces are flat or nearly so (Figs. S7F2, S8C2). Seven species (M. duclouxiana, M. cardiophylla, M. nitens, M. gracilipes, M. breviracema, M. eurybracteata subsp. ganpinensis, and M. pinnata) show epicuticular waxes on the adaxial side of their leaves (Figs. S7D2–F2, S8C2–F2). On the abaxial surface of leaflets, cells with irregular shape and stomatal apparatus are found. The anticlinal walls of lower epidermal cells are either mostly inconspicuous or prominently sinuous, almost stellate in appearance (Figs. 9A3, C3, S7D3). Epidermal cells surrounding stomata are usually sunken, resulting in uneven lower epidermis of leaflets. Wax ornamentations in the form of strips is found on the abaxial surface of M. hancockiana, M. breviracema and M. japonica leaves (Figs. S7D3, F3, F4, S8B3, B4). All the leaves we observed are hypostomatic (Figs. 9, S7, S8). The stomata are anomocytic (Figs. S7D4, S8B3–D3, F3), cyclocytic (Fig. 9D3) or actinocytic (Figs. 9A3–C3, S7A4, B4, E4). In M. bodinieri, M. polyodonta and M. nitens, the abaxial surfaces of leaf epidermis are so flat that we could not detect the cell boundaries and determine the type of stomatal apparatus (Figs. 9E4, F4, S8E4).

Fig. 9
figure 9

Characteristics of leaflets and epidermal surface. A1–A4 M. bealei. B1–B4 M. napaulensis. C1–C4 M. fortunei. D1–D4 M. eurybracteata. E1–E4 M. bordinieri. F1–F4 M. polyodonta. The images show leaflets, adaxial leaves, abaxial leaves and magnifying stomatal apparatus on the abaxial surface in each row from the left to right, respectively

Discussion

Comparative plastome of Mahonia

In the vast majority of flowering plants, complete plastomes share a similar structure comprising a large inverted repeat (IR), a large single copy (LSC) and a small single copy (SSC), respectively ~ 25 kb, ~ 87 kb and ~ 18 kb in length (e.g., Diphylleia Michaux, Dysosma Woodson, Podophyllum L., Sinopodophyllum Ying [(Berberidaceae)] [35]; Maddenia Hook. f. & Thoms, [36]; almost all genera of Styracaceae, [37]). The ebb and flow of IR are not unusual in evolutionary history [38]. IR recognition can display length divergence in angiosperm plastid genomes [39,40,41]. For instance, Pelargonium transvaalense R. Knuth possesses the largest known IR regions with ~ 88 kb in length [42]. More than 10 kb IR expansion was found in Nicotiana acuminata (Graham) Hook. (Solanaceae) [43]. In Trochodendrales, an IR expansion of about 4 kb was observed in both genera Trochodendron Siebold & Zucc. and Tetracentron Oliv. [20, 44]. A possible mechanism for these large and small IR expansions is double-strand DNA break and combination, and gene conversion, respectively [38].

Based on a chloroplast restriction site mapping study, Kim and Jansen [19] proposed that there was a large-scale (ca. 10 kb) IR expansion in the plastomes of Berberideae (Berberis and Mahonia). Ma et al. [20] conducted a comparative plastid genome analysis among four species of Ranunculales, finding that the genome size of Mahonia bealei was about 4.9–9.7 kb larger than the other three species (Nandina domestica [Berberidaceae], Megaleranthis saniculifolia Ohwi and Ranunculus macranthus Scheele [Ranunculaceae]). They inferred that a large IR expansion is the main cause of the significant increase in genome size in Mahonia bealei [20]. A similar 10 kb IR expansion has been described in Ranzania japonica [45] (GenBank ID: MG234280), although this result was not supported in other studies [46]. The plastome structure of Ranzania japonica is controversial and deserves further investigation. We present here the first comprehensive plastome analysis in Mahonia and show that there is a large IR expansion in the plastid genome of all species we investigated. Further intergeneric comparative plastome analyses have attested that a large-scale IR expansion was present in Berberideae [19, 46].

Although sequences in IR regions are commonly well conserved in comparison with single-copy (SC) regions, the IR-SC junctions are relatively variable. As shown in Fig. S2, neither large (> 500 bp) expansions nor contractions are recorded in the plastome of Mahonia species, except within the junctions of IRb-LSC. The IRb regions deeply expand into LSC regions reaching 10 kb in length, resulting in the IRb-LSC boundaries being located upstream of the gene psbB. For many angiosperm plastomes, the IRb-LSC boundaries are located in the gene rps19 [36, 37, 47]. Previous studies have concluded that these variations at IR-SC boundaries can provide more information for elucidating the evolutionary patterns of closely related species [48] and selecting potential phylogenetic molecular markers [49].

In Mahonia, as SSRs and dispersed repeats show abundant variations among different species, they could be developed into molecular markers in the future.

Phylogenetic analysis

In all phylogenetic trees we reconstructed (Figs. 7, S4, S5, S6), and except for the one based on the concatenation of the three common DNA barcodes (Fig. 7D), the monophyletic genus Mahonia was shown to be sister to Berberis with high support values, corroborating the results of previous studies [1, 2, 8]. Furthermore, in the plastome tree (Fig. 7A), the relationships among most clades are well resolved (PP = 1.00) implying the great power of using complete plastomes to address intractable phylogenetic relationships. Plastid phylogenomics of the family Berberidaceae [16] and of its different subordinate taxa have been studied in depth, including Podophylloideae [35], Epimedium [6], Berberis and Mahonia [2, 50]. Given the phylogeny topologies with strong support values, these results demonstrated the power of plastid phylogenomics for improving plastome-based phylogeny, investigating early-divergent events, and conducting taxonomic and plastome evolution analyses. Hsieh et al. [2] used 93 plastomes representing all 19 genera of Berberidaceae to resolve the long-standing disputable taxonomic issues of Berberidaceae. They also paid attention to the phylogeny and plastome structure of the tribe Berberideae, corroborating the considerable topological discordance between nrDNA and plastomes. Our phylogenomic analysis of the genus Mahonia based on more representative taxon sampling than previous studies, provides valuable genetic resources and improves our understanding of the relationships among phylogenetically challenging groups.

Determining the discordance between the topologies generated based on plastome and nuclear DNA has profound significance for clarifying the evolutionary events and evaluating the current phylogenetic frameworks generated by plastome datasets [51, 52]. In our study, we find significant discordance throughout the topologies of plastome tree and ITS tree in particular at deep nodes (Figs. 7A, B). Some nodes unexpectedly clustered with strong support values (e.g., subclades G and E in the ITS tree). These conflicts may be ascribed to ancestral hybridization events and/or incomplete lineage sorting [18, 52]. Exploring the source of discordant relationships is challenging especially in the hyper-diverse taxa, since radiations create opportunities for the evolutionary processes abovementioned [53]. Besides, we observed that the distribution of morphological characters is more congruent with the nuclear-based topology than with the plastome-based topology. Focusing on the ITS tree, the leaflets in subclade H exhibit a continuous morphological transition from linear to elliptic with several fine spined serrations at margins (Figs. 9C1, D1, S7D1, E1, S8A1). The leaflets of the two species M. gracilipes and M. nitens in subclade E show highly similar shapes, which are distinctive from the leaf shapes of all species we investigated (Figs. S8E1, F1). These results indicate that nuclear datasets may have broader implications for morphological character evolution, hybridization and/or incomplete lineage sorting.

Standard DNA barcodes have been shown to often lack sufficient variable characters and thus often fail to discriminate species among many lineages [50, 54,55,56]. Chen et al. [8] added two hypervariable plastid genes (accD and ndhF) and combined them with ITS, rbcL, matK, trnH-psbA. The combined ITS and plastid DNA dataset was used to conduct phylogenetic analyses, revealing that this kind of dataset could significantly improve the intergeneric resolution but had rarely power to address the interspecific phylogenetic relationships. In this paper, the concatenated universal DNA barcodes (rbcL + matK + trnH-psbA) expectedly failed to resolve the complex phylogenetic relationships among the species of Mahonia. On the contrary, based on eight concatenated hypervariable regions, the phylogenetic trees show a similar topological structure with the topology of the plastome trees and possess high support values. This phenomenon indicates that these hypervariable regions yield adequate information to address complicated phylogenetic relationships at the species level. Nowadays, an increasing number of studies consider the hypervariable regions to be valuable and introduce clade-specific barcodes (also named special barcodes) for phylogenetic purposes and even for the purpose of quick identification of medicinal plants [35, 36, 50, 57, 58]. Establishing clade-specific barcodes is far from easy as it depends on a series of factors, including the cost of whole plastome sequencing and sophisticated analytical tools [50]. Based on our results, we believe that the development of clade-specific barcodes has significant implications for species identification and biodiversity conservation in evolutionarily complex taxa [59].

An integrative method for distinguishing closely related species

Micromorphological characteristics are usually constant within species and could be used for detailed species identification [30, 33, 60,61,62]. Despite a high micromorphological similarity for vegetative and/or reproductive organs among closely related species, micromorphological characteristics, e.g., the structure of glandular trichomes (Arnebia and Lithospermum in Boraginaceae [63]), petal epidermal cell patterns (Berberidaceae [31]), palynological characters (Sanguisorba (Rosaceae) [56]), have exhibited great diagnostic value. For instance, the patterns of lemma epidermis are taxonomically discriminant and frequently used to elucidate the phylogenetic relationships among different genera of Poaceae [64,65,66]. The different leaf epidermal characters are congruent with the different clades in Cinnamomum (Lauraceae) retrieved by Huang et al. (2016) [67], implying phylogenetic significance [68]. In addition, Wu et al. (2010) performed a set of investigations about seed morphology (i.e., seed size, color and shape, seed coat ornamentations) for 24 species of Mahonia [30]. They found that although seed morphological characters are conserved at the genus level, they are diversified enough to enable the division of the genus Mahonia into nine types for further systematic studies. In our micromorphological study, the type of stomata, the shape of epidermal cells, the pattern of anticlinal walls and cuticular ornamentation show high diversity among different species. These characters could be regarded as complementary evidence, in addition to molecular data, to distinguish among species of Mahonia.

Given the efficiency and objectivity of molecular data, standard DNA barcodes are used as essential elements for discriminating plants [56]. However, standard DNA barcodes (ITS, concatenated rbcL, matK, and trnH-psbA) could not be used to distinguish the species in the genus of Mahonia, due to the limited diagnostic information.

In the era of NGS technology, an increasing number of research groups can afford the cost of whole plastid genome sequencing and then employ the data sets to resolve challenging phylogenetic relationships [55, 62, 69, 70]. In this context, we generated a robust phylogenetic framework for the species of Mahonia using plastome datasets. Complete plastid genomes encompass adequate sequence variations for detailed identification, but their sequencing encounters some problems: high sequencing cost, huge-scale datasets and sophisticated computational process.

Based on eight concatenated hypervariable regions, the topology of the phylogenetic trees we reconstructed is mostly congruent with the topology of the tree based on the whole plastomes. This indicates that these hypervariable regions have adequate information which is almost equal to the information contained in the whole plastome. We extracted and developed these hypervariable regions into special barcodes, which combine the advantages of standard DNA barcodes and whole plastome [36, 71]. However, molecular data including standard and special barcodes could possibly remain unsuccessful at distinguishing among closely related species, especially in young lineages and lineages hosting an evolutionary radiation.

Micromorphological evidence can be used to address the different alternatives in resolving the polytomies in the tree built based on the special barcodes. For instance, the phylogenetic analysis could not differentiate the three species (M. bordinieri, M. fordii, M. japonica) from each other (Fig. 7C). However, we find that the adaxial surfaces of the epidermal cells of M. bordinieri, M. fordii and M. japonica are slightly convex, convex, and waxy, respectively (Figs. 9E2, S7A2, F2). In addition, the shape of epidermal cells of M. bordinieri and M. fordii is subquadrate and irregular, respectively. We could not determine the cell shape of M. japonica because of its invisible or obscure cell boundaries. On the abaxial surface of leaflets, we found actinocytic stomata in M. fordii. Several epidermal cells get organized in the form of a rosette around the stomata (Fig. S7A4). Anomocytic stomata and annular stripe were observed on the abaxial leaflets of M. japonica (Fig. S7F4). The epidermal cells on the abaxial surface of M. bodinieri appear to be considerably flat, which is obviously different from the other two species (Fig. 9E4). Therefore, these epidermal features could be used to distinguish the closely related species. Given the absence of wax ornamentations on the adaxial leaf surface of M. bodinieri and M. fordii compared to M. japonica, we suspect that the first two species above-mentioned might have a closer relationship.

Also, this study contributes micromorphological evidence to resolve polytomies in the ITS tree (Fig. 7B). The leaflets of M. breviracema are distinctly different from the leaflets of M. shenii regarding surface convexity. The former is convex with wax ornamentation (Fig. S7D2), while the latter is slightly convex (Fig. S8A2). The distinction indicates that M. breviracema may be closer to M. bealei and M. fortunei, and M. shenii may be closer to the M. eurybracteata RC614 [MZ158270] and M. eurybracteata subsp. ganpinensis (Fig. 7B). We can also apply micromorphological traits to differentiate M. napaulensis, M. duclouxiana and M. cardiophylla clearly (Figs. 9B2, S8C2, D2). The epidermal cells on the adaxial surface of leaves in M. napaulensis are convex without wax ornamentation, whereas the other two have substantial wax ornamentations in the form of stripes. More detailed comparison reveals that the leaf surfaces of M. duclouxiana are much flatter than the leaf surfaces of M. cardiophylla. However, we prefer to consider a closer relationship between M. napaulensis and M. duclouxiana based on the similar shape of leaflets (Figs. 9B1, S8C1).

Conclusion

Based on the integration of molecular data from hypervariable regions and epidermal characters of leaflets, we can distinguish all the species we investigated within the genus Mahonia. Although our sampling for next-generation sequencing is not extensive enough to delimit species boundaries clearly, our results shed a light on the taxonomic, phylogenetic, and evolutionary analysis of the genus Mahonia. For further investigations, on the basis of a more comprehensive sampling, we propose an integrative method based on special barcodes and broader macroscopical evidence (e.g., morphological, micromorphological, anatomical and even cytological characteristics) to distinguish closely related species. Furthermore, genetically variable hotspots could be developed as clade-specific barcodes for efficient and rapid species identification especially in medicinal plants. It deserves ongoing and concerted efforts of the "barcode" research community to build a comprehensive system for accurately identifying plant species.

Methods

Taxon sampling, DNA extraction and next-generation sequencing

Fresh leaves from adult plants were collected from Sichuan, Yunnan, and Guizhou provinces of China and immediately dried using silica gel (Table 3). Liang Zhao from Herbarium of Northwest A&F University (WUK) undertook the formal identification of the vouchers and the plant materials used in our study. Voucher specimens of the plant materials we collected have been deposited in WUK. Dried leaves of four species of Mahonia (M. oiwakensis Hayata RC610 [MZ158277], M. japonica (Thunb.) DC. RC615 [MZ158274], M. aquifolium (Pursh) Nutt. RC616 [MZ158265] and M. pinnata (Lag.) Fedde RC618 [MZ158278]) were taken from voucher specimens of the Herbarium of Northwest A&F University (WUK). The 17 species sampled represented all the subclades of Mahonia and these species are from East Asia, Western North America and Europe, where it was inferred to be the center of diversity for Mahonia [8]. For this study, complete plastomes were obtained from 17 species of Mahonia. In addition, we obtained from GenBank plastomes of three Mahonia species (M. eurybracteata subsp. ganpinensis (H. Lév.) Fedde, GenBank ID: MN417307, M. fortunei, GenBank ID: NC_042167, and M. bealei, GenBank ID: MH795308) and plastomes for ten outgroup species to conduct subsequent comparative and phylogenetic analyses (Table 3). Total genomic DNAs were extracted from dried leaves using Cetyltrimethylammonium Bromide (CTAB) method [72] and sequenced using the Illumina Miseq platform (Illumina, San Diego, California, USA) at the Beijing Genomics Institute (BGI). Paired-end sequence reads have been trimmed to remove low-quality reads and adapter sequences using Trimmomatic v0.40 [73].

Table 3 Voucher information and GenBank accession numbers for Mahonia and outgroups

Plastome assembly, annotation and visualization

We obtained approximately 2 GB high-quality data for each sample. The quality-filtered reads were then subjected to de novo assembling with GetOrganelle [74] or NOVOPlasty v4.3 [75], using Mahonia oiwakensis (GenBank ID: MN735221) as a reference for assembly. The 17 newly assembled plastome sequences were deposited in GenBank. We also submitted all the raw sequence data to GenBank and obtained SRA accessions (Table 3).

We annotated the 17 assembled plastomes with default parameters using Plastid Genome Annotator [76] (PGA) and inspected the accuracy of annotations with the annotation results from GeSeq [77]. On the basis of the results from PGA, we corrected the errors using Geneious v11.0.2 [78]. We checked the annotations of tRNA using tRNAscan-SE v2.0 [79]. The circular plastome maps of Mahonia were plotted using online OGDRAW [80].

Comparative genomic analysis

Plastomes of ten Mahonia species were selected for further comparative genomics analyses and also repeated sequence identification. The ten species were representative of the different clades of Mahonia. We aligned the ten representative plastomes of Mahonia using MAFFT v7.450 [81] and adjusted the boundaries in Geneious. Following the same procedure, we aligned seven plastomes from different genera of Berberidaceae for further comparison of intergeneric sequence identity (see the details from section Results). Then, the two aligned matrices were visualized using online mVISTA program [82] under Shuffle-LAGAN mode with default options for other parameters. In both cases, the reference sequence was M. bealei RC601 [MZ158266]. Gene rearrangement events in Mahonia were detected using Mauve v2.4.0 [83].

Using Geneious, we compared the construction of ten representative plastomes of Mahonia and seven plastomes from different genera in Berberidaceae mentioned above. The IR-SC boundaries of plastomes of the species of Mahonia and of the outgroup species were manually detected and plotted.

We employed DnaSP v5.10 [84] to detect the plastid genome divergence and parsimony informative sites among 20 individuals (19 species) of Mahonia. A sliding window analysis (window length = 600, step size = 200) allowed us to determine hypervariable regions and estimate the level of polymorphism for subsequent phylogenetic analyses.

Repeated sequence identification

Microsatellites (SSRs) were identified by MISA [85] with the thresholds of ten repeated units, and 6, 5, 5, 5, 4 repeated units for mono-, di-, tri-, tetra-, penta-, and hexanucleotide SSRs, respectively. We used the online Tandem Repeats Finder [86] to find the tandem repeated sequences with the default settings. REPuter program [87] was used to identify the dispersed repeated sequences, including forward, reverse, complement, and palindromic repeats. The minimum repeated size and Hamming distance were set at 30 bp and three (i.e., 90% sequence identity), respectively.

Phylogenetic analyses

Phylogenetic analyses were made based on 17 newly sequenced complete plastomes of Mahonia and 13 already published plastid genomes (three species from Mahonia and ten species from Berberidaceae and Ranunculaceae for outgroup species). We aligned the plastid genomes and ITS sequences using MAFFT. Phylogenies were reconstructed based on the following datasets: (1) complete plastid genomes; (2) large-single-copy (LSC) region; (3) small-single-copy (SSC) region; (4) one inverted repeat (IR); (5) coding sequences; (6) non-coding sequences; (7) ITS; (8) concatenated sequences of matK, rbcL, and trnH-psbA; and (9) concatenated sequences of eight identified hypervariable regions. We applied the Maximum Likelihood (ML) method and Bayesian inference (BI) for each of the nine datasets to reconstruct phylogenetic trees, respectively. The ML analysis was carried out using RAxML-HPC Black Box [88] on the Cyberinfrastructure for Phylogenetic Research (CIPRES) Science Gateway [89], with 1000 bootstrap replicates and a GTRGAMMA + I model to obtain support values. jModelTest [90] was utilized to compute the best-fit model using the Akaike information criterion (AICc) for each partition, which was also conducted at the CIPRES Science Gateway (Table S3). BI trees were generated with MrBayes v3.2 [91]. The Markov chain Monte Carlo (MCMC) analysis was run for 10,000,000 generations and sampled every 1,000 generations. The first 25% trees were discarded as burn-in. The remaining trees were used to estimate the consensus tree and the Bayesian posterior probabilities.

Recording of morphological and micromorphological character states

Images of mature leaves were taken with a Nikon 7100 camera (Nikon, Japan). Fresh leaves were first fixed in FAA (methanol: acetic acid: ethanol: water = 10:5:50:35). Next, small leaf pieces were dehydrated in an increasing alcohol series and isoamyl acetate series, and then, critical-point dried in CO2 with a K850 critical-point dryer (EMITECH, Ashford, England). Leaf pieces were then mounted on stubs and sputter coated with gold–palladium using a JS-1600 sputter coater (HTCY, China). The materials were photographed with a Hitachi S-3400 scanning electron microscope (SEM, Hitachi, Japan) at 15 kV.