Language model compression with weighted low-rank factorization


Factorizing a large matrix into small matrices is a popular strategy for model compression. Singular value decomposition (SVD) plays a vital role in this compression strategy, approximating a learned matrix with fewer parameters. However, SVD minimizes the squared error toward reconstructing the original matrix without gauging the importance of the parameters, potentially giving a larger reconstruction error for those who affect the task accuracy more. In other words, the optimization objective of SVD is not aligned with the task accuracy. In this work, we propose using Fisher information to weigh the importance of parameters affecting the model prediction, then perform a weighted SVD to factorize the learned matrices of a neural network model. Although our factorized matrices are not necessary to have a smaller reconstruction error, they retain better task accuracy. We perform analysis with the transformer-based language models, showing our weighted SVD significantly reduces the misaligned optimization objectives between low-rank factorization and task accuracy.
The evaluation of compressing compact models shows our method can further reduce 9% to 30% parameters without affecting task accuracy.

Author: Yen-Chang Hsu, Ting Hua, Sung-En Chang, Qian Lou, Yilin Shen, Hongxia Jin

Published: International Conference on Learning Representation (ICLR)

Date: Apr 25, 2022