Abstract Detail



Floristics & Taxonomy

Saryan, Preeti [2], Gupta, Shubham [1], GOWDA, VINITA [2].

Bringing clarity in complexes: A machine learning approach for unsupervised discovery and characterization of clusters using morphological data.

Species delimitation is central to taxonomy and in classical taxonomy, a careful assessment of morphological variations is critical for species’ assignment. This method poses serious challenges in taxa which show high morphological variations (species complexes) because assigning species boundaries among such taxa can be ambiguous. One of the common methods used to delimit species are classification based methods which require a prior assignment of species identity (supervised analysis). However, this is known to add human bias to species assignment. The second common method to delimit species is to use unsupervised approaches (primarily ordination methods such as PCA: principal component analysis, and NMDS: non-metric multidimensional scaling). These methods have been used as visualization tools or as feature extraction tools in supervised learning algorithms like linear discriminant analysis, however, their utilization in taxonomic studies have been very limited. Here we present a robust method for discovery of clusters within species complexes which not only identifies different ways to cluster the data in an unsupervised environment but can also be used to a) test the quality of clusters, and b) deduce the defining morphological features of each cluster.The genus Hedychium (Zingiberaceae) is known to have at least ten species complexes and has challenged taxonomists for decades. We measured 150 morphological characters, both vegetative and reproductive, for each taxa (n=5 to 20) from multiple populations belonging to two major complexes (Spicatum-complex and Coronarium-complex). We first performed spectral clustering using unsupervised clustering methods to delimit species complexes. Next, we performed character analysis to identify morphological characters governing these clusters. Unlike simple k-means clustering, spectral clustering algorithm has the capability of discovering arbitrarily shaped clusters even when standard kernels (like RBF kernel) are used. The spectral clustering algorithm suggested five, nine, and 12 clusters in the two complexes where ten clusters were expected using classical taxonomic tools. Character analysis (mutual information between characters and clusters, correlation, and t-test) identified at least four characters that define the clusters: lateral staminode ratio, labellum ratio, notch, and notch-labellum ratio. We also found that ecological characters like the number of flowers opening per day were important for describing one of the clusters, highlighting the importance of ecology in delimiting species. Spectral clustering facilitated the discovery of species boundaries along with an explanation of their biological significance (using character analysis) in two species complexes within the genus Hedychium in an unsupervised environment.


Related Links:
Tropical Ecology and Evolution (TrEE Lab)


1 - Indian Institute of Science, Computer Science and Automation, Statistics and machine learning Lab, CSA Department, Indian Institute of Science (IISc), Bengaluru, KA, 560012, India
2 - Indian Institute of Science Education and Research Bhopal, Biological Sciences, Lab 303, AB-3, IISER Bhopal, Near Bhauri Village, Bhopal bypass road, Bhopal, Bhopal, MP, 462066, India

Keywords:
Taxonomy
Machine Learning
Clustering
species delimitation
Unsupervised analysis
morphology
Species Complex.

Presentation Type: Oral Paper
Session: FT2, Floristics & Taxonomy II
Location: Virtual/Virtual
Date: Wednesday, July 29th, 2020
Time: 1:45 PM
Number: FT2001
Abstract ID:715
Candidate for Awards:None


Copyright © 2000-2020, Botanical Society of America. All rights reserved