ORIGINAL RESEARCH article
Front. Microbiol.
Sec. Aquatic Microbiology
From microbial diversity to functional potential using dimensionality reduction
- EJ
Emelia Janthina Chamberlain 1
- WB
William Boulton 2
- EC
Elizabeth Connors 1
- TC
Theodore Calianos 1
- JS
Jeff Shovlowsky Bowman 3
- JC
Jessie Creamean 4
- TM
Thomas Mock 2
- HH
Heather Hyewon Kim 1
1. Woods Hole Oceanographic Institution, Woods Hole, United States
2. University of East Anglia, Norwich, United Kingdom
3. University of California San Diego Scripps Institution of Oceanography, La Jolla, United States
4. Colorado State University, Fort Collins, United States
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Abstract
Machine Learning (ML) has become an increasingly prevalent tool in microbial oceanography. The high dimensionality of microbial diversity data from 'omics observations is highly suitable for ML analysis, with many recent studies showcasing its utility for exploratory ecological feature finding and process prediction. Here, we compare the Self Organizing Map (SOM) to two other well-documented dimensionality reduction methods including Principal Coordinate Analysis (PCoA) and Weighted Gene Correlation Network Analysis (WGCNA) using near daily 16S rRNA gene amplicon sequencing data from the 2019-2020 MOSAiC International Arctic Drift Expedition. We then compare k-means clustering outputs from each method to available metagenomes, extracting functionally distinct seasonal microbial ecotypes in the surface Arctic Ocean. Our results indicate the SOM method better represented expected seasonal transitions and identified a greater number of metabolically distinct functional groups than the more traditional PCoA ordination. Investigating the importance of biological context in dimensionality reduction, we also compare these sample-based functional outputs to a taxa clustering approach using a k-means adapted WGCNA correlation network. Ultimately, we identified 4 community ecotypes with distinct taxonomic and functional cut-offs driven by seasonality, water mass, and substrate turnover, highlighting the importance of succession in functional diversity for the central Arctic Ocean. These results further reinforce ML methodologies as a meaningful translator in the mining of historical amplicon datasets to address modern mechanistic questions and potentially provide 'omics informed ecotype diversity to leverage in mechanistic biogeochemical models.
Summary
Keywords
Arctic Ocean, Bacteria, ecosystem function, machine learning, Microbial Diversity
Received
12 January 2026
Accepted
03 April 2026
Copyright
© 2026 Chamberlain, Boulton, Connors, Calianos, Bowman, Creamean, Mock and Kim. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Emelia Janthina Chamberlain; Heather Hyewon Kim
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.