THE IMPORTANCE IN DNA BARCODING OF THE REGIONS WHICH IS COVERING rRNA GENES AND ITS SEQUENCES IN THE GENUS QUERCUS L

Turkey with 18 oak (Quercus) species is one of the richest country according to species number and diversity. The most important reason of the species diversity in Turkey is its location and geomorphological structure which increase climatic effects and seperate Turkey into different phytogeographic regions. Furthermore, hybridization behaviours which frequently observed between oak species, genetic drift, gene flow and ecological factors cause morphological variations in the plants species. All of these factors make it difficult to define the species concept for plant groups like oaks. Therefore, the region covering 18S rRNA gene/ ITS1/ 5.8S rRNA gene/ ITS2/ 25S rRNA gene and secondly intergenic spacer (IGS)/ 5S rRNA gene for barcoding were obtained from genbank and used as a useful tool for the determination and solution of the phylogenetic relations of taxonomically problematic species, also these barcoding regions were compared with each other according to species recognition ability for oak species. As a result, it can be stated that both barcoding regions have high variable sites based on sequence information to identify the species and evaluate relationships of species studied. Introduction DNA barcoding is very important molecular approach for definition biodiversity, evolutionary studies and especially for identifying species with taxonomically problems. DNA sequences prefered in DNA barcoding must have sufficient variability in grouping of species according to common characteristics and separation of taxonomically closely related species. Therefore in past years, different sequences regions belonging to genomic and plastid DNA are experienced and tried to find the best regions for DNA barcoding. Universal barcoding system would be avaluable resource for recognition of unambiguous species and in terms of speed, low cost, reliability (Piredda et al., 2011). Short DNA sequence that contain sufficient sequence variation to distinguish species is used for DNA barcode as molecular marker (Kress and Erickson, 2007). Especially internal transcribed spacer (ITS) regions of rDNA genes in genomic DNA are the sequences prefered the most commonly for plant molecular systematic studies (Baldwin et al., 1995; Alvarez and Wendel, 2003; Bailey, 2003; Sramko, 2008; Sramko et al., 2014). Also the external transcribed spacers (ETS) and the intergenic spacer (IGS) are widely utilized in phylogenetics in addition to ITS region. The cytochrome c oxidase-1 (CO1) gene from the mitochondria has enough nucleotide differentiation rates to identify many groups of animals and is routinely used to identify new species as an universal barcode (Hebert et al., 2003, 2004; Greenstone et al., 2005; Ward et al., 2005; Smith et al., 2006; Piredda et al., 2011; Hürkan, 2017) but this rate is relatively low and *Corresponding author, E-mail: aykut.yilmaz@usak.edu.tr


Introduction
DNA barcoding is very important molecular approach for definition biodiversity, evolutionary studies and especially for identifying species with taxonomically problems. DNA sequences prefered in DNA barcoding must have sufficient variability in grouping of species according to common characteristics and separation of taxonomically closely related species. Therefore in past years, different sequences regions belonging to genomic and plastid DNA are experienced and tried to find the best regions for DNA barcoding. Universal barcoding system would be avaluable resource for recognition of unambiguous species and in terms of speed, low cost, reliability (Piredda et al., 2011).
Short DNA sequence that contain sufficient sequence variation to distinguish species is used for DNA barcode as molecular marker (Kress and Erickson, 2007). Especially internal transcribed spacer (ITS) regions of rDNA genes in genomic DNA are the sequences prefered the most commonly for plant molecular systematic studies (Baldwin et al., 1995;Alvarez and Wendel, 2003;Bailey, 2003;Sramko, 2008;Sramko et al., 2014). Also the external transcribed spacers (ETS) and the intergenic spacer (IGS) are widely utilized in phylogenetics in addition to ITS region.
The cytochrome c oxidase-1 (CO1) gene from the mitochondria has enough nucleotide differentiation rates to identify many groups of animals and is routinely used to identify new species as an universal barcode (Hebert et al., 2003(Hebert et al., , 2004Greenstone et al., 2005;Ward et al., 2005;Smith et al., 2006;Piredda et al., 2011;Hürkan, 2017) but this rate is relatively low and *Corresponding author, E-mail: aykut.yilmaz@usak.edu.tr unsuitable in plants (Chase et al., 2005;Kress et al., 2005;Fazekas et al., 2008;Hollingsworth et al., 2009). Therefore alternative barcode regions should be screened that can be universally successful in all species, however such a barcode region has not been found yet (Chase and Fay, 2009;Hollingsworth et al., 2009).
Many regions of chloroplast genome for plant species as effective strategy for barcoding are recently used to resolve problems and the relationships in species level. Nevertheless there is still much debate related to the most suitable regions to be used in chloroplast genome. There is no barcoding region available to be used for all plant groups. Barcoding regions used together or whole chloroplast genome could provide enormous data and specificity for universal barcoding in plants. The regions and the region combinations belonging to chloroplast genome like rbcL, matK, trnK, trnH-psbA, atpB-rbcL, trnT-trnF are commonly and effectively used for plant phylogenetic analysis.
As a result, barcoding is used as a useful tool for the determination and solution of the phylogenetic relations of taxonomically problematic species. The genus Quercus represented by over 500 species in the northern hemisphere show high phenotypic variation with natural hybrids (Manos et al., 2001;Borazan and Babaç, 2003;Yılmaz, 2018a).
The most important reason of the high species diversity in Turkey is its location and geomorphological structure which increase climatic effects and seperate Turkey into different phytogeographic regions (Uslu and Bakış, 2012;Yılmaz, 2018b). Turkey is between the Asian and European continents that is used an important migration route for many plants and animals. Another factor on species diversity and number in Turkey is the Anatolian Diagonal which divides Anatolia as eastern and western parts (Davis, 1971;Çıplak et al., 1993;Borazan and Babaç, 2003;Yılmaz, 2018 a,b).
Furthermore oak species can spread across wide geographic regions via wind and grow in mixed populations that increase the hybridization between species belonging to same or different sections (Hokanson et al., 1993;Kremer and Petit, 1993;Bacilieri et al., 1996).
In addition to all factors, insufficient diagnostic morphological characters that it is sometimes not possible to identify oak species due to high morphological variation (Denk and Grimm, 2010;Simeone et al., 2013) and the lack of investigations such as ecological, historical and genetic descriptors make problematic the genus Quercus in Turkey and similarly in the world.
Hybridization behaviours, gene flow, genetic drift, ecological factors and epigenetic mechanisms cause morphological variation in the plants species. The classical taxonomic system that is based on the morphological similarity of individuals makes it difficult to define the concept of biological species, especially for plant groups like oaks. Therefore molecular markers instead of morphological characters are frequently prefered to identify the oak species and understand the oak evolution (Oh and Manos, 2008;Denk and Grimm, 2010;Simeone et al., 2013;Yılmaz et al., 2013;Yılmaz, 2016). Especially DNA barcoding has been used as the most useful tool in solving these problems.
The objective of this study is to evaluate phylogenetic relationships of Quercus species by using the 18S rRNA gene/ ITS1/ 5.8S rRNA gene/ ITS2/ 25S rRNA gene and intergenic spacer (IGS)/ 5S rRNA gene from genbank and compare these barcoding regions according to species recognition ability.
Informations related to studied taxa were obtained from National Centre of Biotechnology Information (NCBI). Studied taxa and genbank codes for the rRNA gene regions analysed in this study are presented in Table 1 and 5. Sixteen taxa for first region containing ITS1 and ITS2 together with related genes of rRNA and 15 taxa for the other containing IGS and 5S rRNA gene were prefered and analysed for phylogenetic relations. Almost all of taxa selected for this study belong to Turkey except a few species. While the locations of 15 studied taxa for first region analysed (18S rRNA gene ITS1/ 5.8S rRNA gene/ ITS2/ 25S rRNA) belong to completely Turkey except three species, all taxa prefered for second region belong to Turkey.

Sequence alignment and Phylogenetic analysis
Multiple sequence alignments for both regions were seperately performed by using Molecular Evolutionary Genetics Analysis (MEGA). The probabilities of substitution from one base to another base, transition/transversion ratios for purines-pyrimidines and overall, nucleotide frequencies were computed by using alignment sequences that were edited (Tables 2-4 & 6-8).
Neighbour-joining dendrograms that bootstrap values are reported above branches for two regions such as 18S rRNA gene/ ITS1/ 5.8S rRNA gene/ ITS2/ 25S rRNA gene and secondly IGS/ 5S rRNA gene were obtained with MEGA X program (Figs 1-2). All positions containing gaps and missing data were eliminated (complete deletion option). Consequently, evolutionary analyses were conducted by using a total of 283 positions in the final dataset for IGS/ 5S rRNA gene and a total of 690 positions in the final dataset for 18S rRNA gene/ ITS1/ 5.8S rRNA gene/ ITS2/ 25S rRNA gene.

Results and Discussion
Eighteen oak species belonging to three subgeneric sections (Quercus, Cerris and Ilex) currently occur in Turkey that is one of the richest country with species diversity and number (Yaltirik, 1984). Two rDNA regions that sequence information is provided from taxonomy database of NCBI were used for phylogenetic analysis.
The foundamental aims of the study is firstly to evaluate the taxa from Turkey belonging to genus Quercus according to phylogenetic relations and to contribute the solution of taxonomic problems, secondly to compare two rDNA regions frequently used in DNA barcoding and evaluate the region that gives the best results for barcoding. Analysis results for region covering 18S rRNA gene/ ITS1/5.8S rRNA gene/ ITS2/25S rRNA gene: The valuable informations about the taxonomy of studied taxa were provided from analysis of the first region. This genomic DNA region has the quite wide sequence data covering three rRNA gene (18S rRNA gene, 5.8S rRNA gene and 25S rRNA gene) and two spacer regions (ITS1 and ITS2) giving the information useful for plant systematics in species and generic level. This DNA region has alignment length of 697 bp for taxa studied and showed 94 variable sites. Studied taxa and accession numbers obtained from NCBI are given in Table 1. Table 2 shows the probability of substitution (r) from one base to another base. For simplicity, the sum of r values is made equal to 100. Rates of different transitional substitutions are shown in bold and those of transversional substitutions are shown as italics in Table 2. This analysis involved 16 nucleotide sequences. All positions containing gaps and missing data were eliminated (complete deletion option). There were a total of 690 positions in the final data set. Evolutionary analyses were conducted in MEGA X.  Table 2. Moreover, transitional substitutions of the pyrimidines are higher than purines (Table 2). In the comparison of purines (k 1 ) and pyrimidines (k 2 ) according to transition/transversion ratio, pyrimidines with 13,02 show the higher value from purines (Table 3) Table  3). The nucleotide frequencies are 19.66% (A), 17.84% (T/U), 32.19% (C), and 30.30% (G) ( Table 4). It can be stated that the percentage of G and C bases for all studied Quercus taxa for the DNA region containing 18S rRNA gene/ ITS1/ 5.8S rRNA gene/ ITS2/ 25S rRNA gene is higher than the percentage of A and T/U bases (Table 4).

. Overall transition/transversion ratio (R) is 4,01 (R = [A*G*k1 + T*C*k2]/[(A+G)*(T+C)]) in the evaulation of all positions in the final dataset (
Neighbor-Joining (NJ) dendrogram was drawn to show the phylogenetic relations of 16 Quercus taxa (Fig. 1). The evolutionary history was inferred using the Neighbor-Joining method (Saitou and Nei, 1987). The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Maximum Composite Likelihood method (Tamura et al., 2004) and are in the units of the number of base substitutions per site. The differences in the composition bias among sequences were considered in evolutionary comparisons (Tamura and Kumar, 2002). It can be stated as a result of the examination of NJ tree that the DNA region of interest has sufficient information for species separation and sectional grouping of species. Sequence datas provided from studied taxa seperates the species to three sections as Quercus, Ilex and Cerris (Fig. 1). All samples are clearly differentiated from each other. Furthermore, NJ tree showed that this barcoding region for the Quercus taxa has the enough sequence information with 94 variable sites to compare and evaulate especially taxonomically closely related species. Analysis results for region covering IGS and 5SrRNA gene IGS/5S rRNA gene region has the alignment length of 398 bp and 115 variable sites for taxa studied. Studied taxa and accession numbers obtained from NCBI are given in Table 5. Variable region of the IGS/5S rRNA gene sequences is widest than the variable sites of 18S rRNA gene/ ITS1/ 5.8S rRNA gene/ ITS2/ 25S rRNA gene sequences. In other words, it can be stated that although IGS/5S rRNA gene has shorter DNA sequences, it has more distinctive information for Quercus taxa.
The probability of substitution (r) from one base to another base was shown in Table 6. This analysis involved nucleotide sequences belonging to 15 Quercus taxa. All positions containing gaps and missing data were eliminated and evolutionary analyses were conducted in MEGA X.
The rate of transitional substitutions with 63.78 are higher than transversional substitutions according to total base substitutions (Table 6). Furthermore, 65% of transitional substitutions is caused by base substitutions of pyrimidines with each other. Transition/transversion ratios of purines and pyrimidines are 2.87 and 4.00, respectively (Table 7). In other words, transitional substitutions of both base group are higher than transversional substitutions. Overall transition/transversion ratio is 1.86 in the evaulation of all positions in the final data set ( Table 7).
Finally, phylogenetic relations of 15 Quercus taxa was showed with Neighbor-Joining (NJ) dendrogram (Fig. 2). The evolutionary distances were computed using the Maximum Composite Likelihood method. It can be stated that NJ tree of interested DNA region seperated the studied taxa to three group as sectional and besides had sufficient information for species separation and phylogenetic relations of species. Furthermore, NJ tree showed that this barcoding region like other studied region has the enough sequence information to evaulate taxonomically problematic species like members of the genus Quercus.  In Turkey, oaks which are represented by 18 species have wide geographical ditribution and dominated the most of forests. Oaks having such a wide geographical spread and variety of species has been used many purposes because of economically importance, such as foods, furniture and especially fuel wood. This situations increases the taxonomic problems in the genus and make it difficult the species definition. Additionally, location of Turkey between the Asian and European continents serve as a migration route for many plants such as oaks. Weak reproductive barriers and mixed populations in many regions are observed between oak species. All of these factors may be reason of the extensive hybridization behaviours, morphological variation in the species level and also taxonomic problems.
The determination of succesful barcoding regions for the genus Quercus would have a considerable effect in improving available taxonomic problems and in the species level identification. For this reason, two genomic DNA region containing 18S rRNA gene/ITS1/5.8S rRNA gene/ITS2/25S rRNA gene and intergenic spacer (IGS)/5S rRNA gene proposed by the Consortium for the Barcode of Life (CBOL) were used as molecular markers and compared with each other. As a result, it can be stated that both barcoding regions have high variable sites based on sequence information to identify the species and evaluate relationships of species studied. Furthermore, it was observed that neighbour-joining dendrograms containing the full oak data for both barcoding regions seperated the species to three group as sectional and besides studied taxa from each other.
When it is evaluated the phylogenetic relationships of oaks which are completely similarly grouped as sectional by both NJ dendrograms; it can be stated that the evolutionary distances among Q. ithaburensis subsp. macrolepis, Q. brantii, Q. trojana and Q. cerris belonging to section Cerris showed similarity for each barcoding region. Also it is observed that section Ilex and section Cerris is phylogenetically more close than section Quercus for both barcoding region. The comparisons of three species (Q. coccifera, Q. ilex and Q. aucheri) belonging to section Ilex show to us that Q. ilex and Q. aucheri are closer two taxa than Q. coccifera. Similarly, Yılmaz et al., (2013) stated in previous report on DNA comparison of related three species from section Ilex that Q. ilex and Q. aucheri were observed as close two separate groups and populations of Q. coccifera showed more differences than populations of Q. ilex and Q. aucheri. Besides that, other study on the based the all chromosomal parameters such as length range, haploid complement, A1 and A2 values of these three taxa show similarity the results provided from barcoding regions and supports the study results (Yılmaz, 2018b).
In a previous study; Denk and Grimm (2010) used the ITS and 5S-IGS data to recognize the major infrageneric groups and the phylogenetic relationships among the species of Quercus from western Eurasia. However sequence regions encoding for the 18S, 5.8S and 25S rRNA were excluded from the analyses by Denk and Grimm (2010) on the contrary of this study. While the individuals of Q. pontica formed a distinct group in the study of Denk and Grimm (2010), in this study NJ dendrograms showed that Q. pontica evaluated within the section Quercus is the outmost species in the comparison to other species belonging to section Quercus.
The comparisons of alignment lengths and variable sites of the barcoding regions studied show to us that although IGS-5S rRNA gene region with the 398 bp alignment length is smaller than other barcoding region, it exhibit more sequence variation with the range of 28.29%. However, when the sites with missing/ambiguous data and gaps were excluded for effective analyses, IGS-5S rRNA gene for studied taxa show 16.83% variation range. In other words, it has high missing data in the comparison to other barcoding region containing the sequence variation of 13.48%.
Analyses especially for the species whose sequence lengths differ due to regions containing wide deletions exhibit missing/ambiguous data and gaps in sequence alignment. All species of the section Quercus analyzed using the IGS-5S rRNA gene sequence information have regions of such deletion in comparison to other species belonging to section Cerris and Ilex. Denk and Grimm (2010) states that "A number of newly assembled and gene bank sequences include missing data due to the fact that a guanine-rich region within the 5′ ITS1 region can be difficult to sequence". For this reason, Denk and Grimm whose added the sequence information to NCBI GenBank used by us re-run the sequencing to guarantee at least one completely sequenced ITS clone per individual.
It can be said that both barcoding regions have important sequence information for species identification and evaluation of evolutionary relations in oaks, also these are recommended for further studies.
In Turkey, another important reason that makes it difficult to understant the oaks besides hybridization is the lack of adequate conservation programs. Turkey is a very valuable country with 11000 taxon and 35% endemism rate in terms of plant diversity (Vural, 2003). Therefore, the results of the study are important for the determination of plant diversity and the conservation of genetic resources. Especially, Q. aucheri, Q. vulcanica and Q. macranthera subsp. syspirensis which are endemic taxa are valuable resources due to restricted distribution area. While distribution area of Q. aucheri is restricted to south-west Anatolia in Turkey, Q. macranthera subsp. syspirensis which is distributed in north and north-east regions of Anatolia has shown wider distribution than Q. aucheri. Other endemic species, Q. vulcanica distributed from 1200 to 2000 m altitude has more restricted area and isolated habitats such as Isparta-Eğirdir (Yukari Gokdere village), Konya-Sultan Mountains and Kutahya-Turkmen Mountains when compared with other species. However, Q. vulcanica has been faced with the threat of extinction. Furthermore, there is not enough protection program for conservation of oak biodiversity. This study contributes to understant the biodiversity and genetic resources of oaks besides the understanding the phylogenetic relationships of the oaks.