APPLICATION OF ORDINAL CLUSTERING TO THE TAXONOMY OF THE GENUS ENTADA (FABACEAE) IN TAIWAN

The importance of a mathematically correct treatment of ordinal information has not sufficient been emphasized. Since ordinal cluster analysis (OrdClAn) and non-metric multidimentional scaling (NMDS) are order-invariant, an attempt has been made to illustrate the previous taxonomic treatments of three species of Entada Adans. (Fabaceae) of Taiwan by these two methods. The genus Entada is represented in Taiwan by E. koshunensis Hayata et Kanehira, E. phaseoloides (L.) Merr. and E. rheedii Spreng. In this study, 29 morphological characteristics and 14 operational taxonomic units (OTUs) representing these three species were selected to construct a primary mixed data matrix. Due to incomplete and limited herbarium specimens, the same local specimens were combined to form 14 OTUs from 62 specimens preserved in several Taiwan herbaria. The characters were measured on binary nominal, multistate nominal, ordinal, and quantitative scales. OrdClAn and NMDS based on Gower’s index were used to evaluate the relationship among the three Entada species from the primary matrix. The cluster dendrogram and the ordination of OTUs showed that the genus Entada obviously can be divided into three groups representing three existing species, therefore, agreed with previous taxonomic analyses. Key words : Entada , Non-metric multidimentional scaling, Numerical taxonomy, Ordinal cluster analysis, Ordinal scaling DOI: 10.3329/bjpt.v14i2.529  Bangladesh J. Plant Taxon. 14(2): 93-100, 2007 (December)


Introduction
Nominal, ordinal, interval and ratio data types are usually used in numerical taxonomy or vegetation ecology (Anderberg 1973).The methodological sequence implied by exploratory analysis of ecological data includes sampling, data collection, resemblance matrix calculation, and classification or ordination (Podani 2005).Each move between these levels of abstraction can be carried out by applying either an ordinal (O, sequence-/rank-based) or a metric (M, difference-/ratio-based) scale and methodological paths, such as M- M-O, O-M-O, O-M-M, O-O-M, M-O-M, M-O-O, O-O-O, are implied.In these paths ordinal and metric properties are confounded.Switching from M scale to O scale, some information is always lost and actual differences between distances become neglected, while changing from O scale to M scale, the increase of information may not be mathematically correct (Krauth 1986, Dale 1989, Podani 2005).Future multivariate analysis should be ordinal in nature so that the order of dissimilarities can be considered by algorithm.Thus, the sequence O-O-O is the best combination for the processing of ordinal data because this path can maintain the consistency of the analysis and the overall statistic precision.However, very few reports emphasized the importance of a correct treatment of ordinal information in explanatory data analysis in the past decade.
Multivariate analysis is most problematic when ordinal variables appear together with other scale types in the data.However, this obstacle can be solved by extending Gower's general coefficient of similarity to ordinal data types, facilitating cluster and multidimensional scaling (Podani 1999).Ordinal cluster analysis (OrdClAn) proceeds in the same manner as agglomerative clustering method.Non-metric multidimentional scaling (NMDS) (Kruskal 1964) relies on the ordinal information and represents a good alternative to the metric procedures (Podani 2000(Podani , 2005)).In NMDS, any change in the dissimilarity matrix of objects have no impact on the final ordination that the first two axes usually provide a fairly good representation of objects or variables.Gordon (1999) suggested that the name ordinal scaling is more appropriate than non-metric scaling.These two methods are order invariant and this property ensures that the results are independent of changes not affecting ordinal relationships (Podani 1999(Podani , 2005(Podani , 2006)).
In Taiwan, three species of Entada Adans.(Fabaceae), namely E. koshunensis Hayata et Kanehira, E. phaseoloides (L.) Merr.and E. rheedii Spreng.occur (Yang et al. 2005), of which E. koshunensis is endemic.Entada is frequently found among riverine vegetation and the segments of its pods are dispersed by water.The terminal pinnae of the Entada species are modified into tendrils and the stems are often more than 30 cm in diameter.The plant can grow extremely quickly towards the light and has the ability to entangle trees (Nielsen 1992).As the leaflets and flowers of the three Entada species are very similar in shape and the specimens preserved in Taiwan herbaria are few and incomplete, these characteristics are the factors that obviously influence academic decisions in identifying the species.The aims of the present study are to provide a primary mixed data matrix (Podani 2001) and use multivariate analysis by OrdClAn and NMDS to affirm the results of previous taxonomic treatments (Ho 1985, Huang and Ohashi 1993, Yang et al. 2005).

Materials and Methods
Sixty-two voucher specimens of Entada species deposited in three herbaria of Taiwan, namely Provincial Pingtung Institute (PPI), National Taiwan University (TAI), Taiwan National Museum (TNM), were used and the same local specimens were combined to represent the fourteen operational taxonomic units (OTUs).Twenty-nine morphological characters (character states, variables) were selected to distinguish the three species of the genus (Table 1).OTUs 1-5 were the species identified as E. koshunensis, 6-10 identified as E. phaseoloides, and 11-14 identified as E. rheedii.The characters were measured on binary nominal, multistate nominal, ordinal, and ratio scales.The binary nominal data type involves a categorization without numerical values or ranks.It involves two states, such as presence/absence of data or black/white coded as 1/0.The multistate nominal data type involves three states or more, such as red/blue/black.The ordinal data type can be placed in rank order along a continuum.Some of the abundance scale for recording vegetation data belongs to the ordinal data type.The ratio data type includes real quantitative values, and thus, the differences between values of objects can be compared (Podani 1994).For example, the presence of   an average from five replicate samples from the same OTU.A primary mixed data matrix was made by 14 objects (OTUs) and 29 character states (Table 2) for the explanatory analysis.
Gower's formula was used to calculate the primary mixed data into dissimilarities, which were in turn subject to NMDS and the application of OrdClAn (program SYN-TAX 2000, Podani 2001).Single link (SL, nearest neighbor) method was selected because SL possesses the property of order invariance (Hubert 1973, Boberg andSalakoski 1993).A dendrogram of ordinal cluster analysis of Taiwan Entada taxa based on 29 character states of 14 OTUs has been constructed.For evaluating the correlation between the dendrogram and the distance matrix, the cophenetic correlation coefficient was measured: the higher the correlation, the better the representation of distances in the hierarchy.If the value is larger than 0.9, the correlation is high, if the value is lower than 0.74, the correlation is not significant (Sneath and Sokal 1973).The results of NMDS will show the first two most important dimensions about the OTU distribution pattern and measure final stress value.The stress value of zero indicates perfect fit of the ranked ordination distances to the original distances.

Results and Discussion
The dendrogram obtained by OrdClAn is presented in Fig. 1.Three groups are recognized from the threshold line between the dissimilarity values of 0.288-0.378.The first group is composed of the OTUs 1-5, the second group comprises OTUs 6-10, and the third group is composed of the OTUs 11-14.The cophenetic correlation coefficient of 0.933 shows a significant correlation between the dendrogram and the distance matrix.
The NMDS solutions for the first two dimensions are shown in Fig. 2. The final stress of 0.0304 indicates that the rank order of distance in the new space follows the original rank order as closely as possible.The points representing E. koshunensis are distributed along the middle-lower side; points in the upper-left of the ordination are E. phaseoloides; points in the upper-right of the ordination are E. rheedii.This OTU ordination (Fig. 2) is very close to the dendrogram obtained by OrdClAn (Fig. 1).
Given the few and incomplete herbarium specimens of Entada in Taiwan herbaria, there are controversial opinions concerning the taxa of the genus in Taiwan (Ho 1985, Huang and Ohashi 1993, Yang et al. 2005).The results from OrdClAn and NMDS indicate that three groups can be clearly distinguished from one another and the endemic species E. koshunensis is indeed different from the others and truly exists in Taiwan.This result is close to the results of Yang et al. (2005).
OTU 8 is composed of three specimens, which were collected from Mainland China and submitted to the TNM herbarium during the exchanging process.The point of OTU 8 is distributed in the upper-left near the second group (E.phaseoloides) (Fig. 2).Indeed, these three specimens collected from Mainland China have some different characteristics, such as leaflet size and shape, however, the evidence is not clear enough to be able to separate OTU 8 from the second group.The difference of the characteristics may be influenced by the geographical location or the environmental conditions of the microhabitat.We suggest that these three specimens belong to the species E. phaseoloides.Nielsen (1992) showed the ranges of various species of Entada in Asia, but in his work E. phaseoloides was not distributed in Taiwan.Despite few different characters among five OTUs of the second group (Fig. 2), the present study confirms that the range of E. phaseoloides includes the Taiwan area.
Deciding on an appropriate correct treatment of ordinal information is still complicated by contrasting opinions presented in the literature (e.g.Podani 2006, van der Maarel 2007).However, the dissimilarity coefficient should be compatible with ordinal variables and the subsequent ordination of clustering methods should consider only the rank order of dissimilarities (Podani 2005).The idea of O-O-O sequence is very important for multivariate exploratory analysis and extending Gower's general coefficient of similarity to OrdClAn and NMDS will meet this excellent path and obtain good results.According to the OrdClAn and NMDS results, it is obvious that Taiwan Entada surely includes three independent taxa.Entada koshunensis truly exists in the fragmented land of Hengchun Peninsula with few populations, E. phaseoloides is widely distributed in central Taiwan and more abundant than the other two species, and E. rheedii is rare just found in southern Taiwan.

Fig. 1 .
Fig. 1.Dendrogram of ordinal cluster analysis of Taiwan Entada taxa based on 29 character states of 14 operational taxonomic units (OTUs).The cophenetic correlation coefficient = 0.933, showing a significant correlation between the dendrogram and the distance matrix.

Table 1 . Four scales describing 29 character states of the genus Entada in Taiwan. B: binary nominal, M: multistate nominal, O: ordinal, Q: quantitative.
thickness, and endocarp thickness are ratio scale.The ratio scale variables are obtained as