Penalized logistic normal multinomial factor analyzers for high dimensional compositional data
Keywords:Model-based clustering, penalized factor analyzers, microbiome data, variational approximation
Model-based clustering utilizes a finite mixture model to identify underlying patterns or clusters across samples. A finite mixture model is a convex combination of two or more distributions, where appropriate distributions are chosen depending on the type of the data. Recently, there has been a great interest in clustering human microbiome data. Microbiome data are compositional (yielding relative abundance) and are high-dimensional. Previously, a family of logistic normal multinomial factor analyzers (LNM-FA) for model-based clus- tering of high-dimensional microbiome data was proposed via a factor analyzer structure. This reduced the number of parameters and computation overhead compared to a traditional mixtures of logistic normal multinomial models. Here, we propose a penalized LNM-FA (PLNM-FA) model by utilizing lasso regularization to each entry of the loading matrix. This introduces further parsimony compared to LNM-FA and also estimates the number of latent factors simultaneously. Parameter estimation is done using a variational variant of the alternating expectation conditional maximization algorithm to maximize the penalized maximum likelihood. The performance of proposed algorithm is evaluated using simula- tion studies and real data.
Journal of Statistical Research 2022, Vol. 56, No. 2, pp.185-216
How to Cite
Copyright (c) 2022 Journal of Statistical Research
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.