ORIGINAL RESEARCH article

Front. Microbiol.

Sec. Aquatic Microbiology

From microbial diversity to functional potential using dimensionality reduction

  • 1. Woods Hole Oceanographic Institution, Woods Hole, United States

  • 2. University of East Anglia, Norwich, United Kingdom

  • 3. University of California San Diego Scripps Institution of Oceanography, La Jolla, United States

  • 4. Colorado State University, Fort Collins, United States

The final, formatted version of the article will be published soon.

Abstract

Machine Learning (ML) has become an increasingly prevalent tool in microbial oceanography. The high dimensionality of microbial diversity data from 'omics observations is highly suitable for ML analysis, with many recent studies showcasing its utility for exploratory ecological feature finding and process prediction. Here, we compare the Self Organizing Map (SOM) to two other well-documented dimensionality reduction methods including Principal Coordinate Analysis (PCoA) and Weighted Gene Correlation Network Analysis (WGCNA) using near daily 16S rRNA gene amplicon sequencing data from the 2019-2020 MOSAiC International Arctic Drift Expedition. We then compare k-means clustering outputs from each method to available metagenomes, extracting functionally distinct seasonal microbial ecotypes in the surface Arctic Ocean. Our results indicate the SOM method better represented expected seasonal transitions and identified a greater number of metabolically distinct functional groups than the more traditional PCoA ordination. Investigating the importance of biological context in dimensionality reduction, we also compare these sample-based functional outputs to a taxa clustering approach using a k-means adapted WGCNA correlation network. Ultimately, we identified 4 community ecotypes with distinct taxonomic and functional cut-offs driven by seasonality, water mass, and substrate turnover, highlighting the importance of succession in functional diversity for the central Arctic Ocean. These results further reinforce ML methodologies as a meaningful translator in the mining of historical amplicon datasets to address modern mechanistic questions and potentially provide 'omics informed ecotype diversity to leverage in mechanistic biogeochemical models.

Summary

Keywords

Arctic Ocean, Bacteria, ecosystem function, machine learning, Microbial Diversity

Received

12 January 2026

Accepted

03 April 2026

Copyright

© 2026 Chamberlain, Boulton, Connors, Calianos, Bowman, Creamean, Mock and Kim. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Emelia Janthina Chamberlain; Heather Hyewon Kim

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Share article

Article metrics