Penalized logistic normal multinomial factor analyzers for high dimensional compositional data

Authors

  • Wangshu Tu School of Mathematics and Statistics, Carleton University, 1125 Colonel By Dr, Ottawa, Ontario, Canada K1S 5B6
  • Sanjeena Subedi School of Mathematics and Statistics, Carleton University, 1125 Colonel By Dr, Ottawa, Ontario, Canada K1S 5B6

DOI:

https://doi.org/10.3329/jsr.v56i2.67469

Keywords:

Model-based clustering, penalized factor analyzers, microbiome data, variational approximation

Abstract

Model-based clustering utilizes a finite mixture model to identify underlying patterns or clusters across samples. A finite mixture model is a convex combination of two or more distributions, where appropriate distributions are chosen depending on the type of the data. Recently, there has been a great interest in clustering human microbiome data. Microbiome data are compositional (yielding relative abundance) and are high-dimensional. Previously, a family of logistic normal multinomial factor analyzers (LNM-FA) for model-based clus- tering of high-dimensional microbiome data was proposed via a factor analyzer structure. This reduced the number of parameters and computation overhead compared to a traditional mixtures of logistic normal multinomial models. Here, we propose a penalized LNM-FA (PLNM-FA) model by utilizing lasso regularization to each entry of the loading matrix. This introduces further parsimony compared to LNM-FA and also estimates the number of latent factors simultaneously. Parameter estimation is done using a variational variant of the alternating expectation conditional maximization algorithm to maximize the penalized maximum likelihood. The performance of proposed algorithm is evaluated using simula- tion studies and real data.

Journal of Statistical Research 2022, Vol. 56, No. 2, pp.185-216 

Abstract
102
PDF
67

Downloads

Published

2023-07-09

How to Cite

Tu, W. ., & Subedi, S. . (2023). Penalized logistic normal multinomial factor analyzers for high dimensional compositional data. Journal of Statistical Research, 56(2), 185–216. https://doi.org/10.3329/jsr.v56i2.67469

Issue

Section

Articles