random lab site

Low-Rank Optimization for Memory-efficient Large Language Model Pretraining

This project explores better ways to do low-rank optimization for memory efficient LLM training.

Most existing approaches select the dominant subspace to preserve gradient information, as this intuitively provides the best approximation. However, we find that in practice, the dominant subspace stops changing during pretraining, thereby constraining weight updates to similar subspaces.

In this study, we propose importance sampling subspace selection (I3S) for low-rank optimization, which theoretically offers a comparable convergence rate to the dominant subspace approach. Empirically, we demonstrate that I3S significantly outperforms previous methods in LLM pretraining tasks.

Paper is under review for ICML 2025: I3S: Importance Sampling Subspace Selection for Low-Rank Optimization in LLM Pretraining

Previous post
Compressed Vocabulary Expansion Makes Stronger Recommender Systems