Numerical Optimizations for Weighted Value Decomposition on Language Models
Singular value decomposition (SVD) is one of the most popular compression methods that approximate a target matrix with smaller matrices. However, standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption. In real cases, the parameters of a trained neural network model affect the task performance unevenly, suggesting non-equal importance among the parameters. Therefore, this paper proposed Fisher information weighted Value Decomposition (FVD) to compress a neural network model with the awareness about parameter importance. Unlike standard SVD, FVD is a non-convex optimization problem that lacks a closed-form solution. Therefore, optimizing FVD is non-trivial.
We systematically investigated multiple optimization strategies to tackle the problem and examined our method by compressing transformer-based language models.
Further, we designed a metric to predict when the SVD may introduce a significant performance drop, and our FVD can be a rescue strategy.
The extensive evaluations demonstrate that our FVD can perform comparable or even better with current SOTA methods in compressing Transformer-based language models.
Also, the analysis of Transformer-blocks shows that our FVD can achieve significant performance improvements over SVD on the sub-structure factorization.
Author: Ting Hua, Yen-Chang Hsu, Felicity Wang, Retiree, Yilin Shen, Hongxia Jin
Published: Conference on Empirical Methods in Natural Language Processing (EMNLP)
Date: Dec 9, 2022