Foreground-Specialized Model Imitation for Instance Segmentation
Abstract
We leverage the knowledge distillation to address the object instance segmentation for robots with limited computational power. Instance segmentation is formulated as a multi-task learning problem involving object classification, localization and mask prediction. However, knowledge distillation is not well-suited to these sub-tasks except only one of them, i.e., multi-class object classification. To deal with this challenge, we introduce a novel distillation method where the teacher is a small foreground-specialized (FS) model. We train the FS instance segmentation teacher model using images with only foreground objects, i.e., background pixels are removed. So, the FS instance segmentation model is effective in object classification which is exactly what the distillation method is designed exclusively for. To accommodate the difference between inputs used by the teacher and student, we introduce a novel Foreground-Specialized model Imitation (FSI) method with two complementary module components. First, a reciprocal anchor box selection method is introduced to distill from the most informative output of the teacher model. Second, to embed the foreground-awareness in the students feature learning, we come up with two solutions by either adding a co-learned foreground segmentation branch or applying a soft feature mask. We conducted an extensive evaluation with the state-of-the-art one-stage object instance segmentation method YOLACT which is suitable for on-device inference. Experiment results on MS COCO and Pascal VOC datasets demonstrate that our method significantly outperforms knowledge distillation baselines in terms of both accuracy improvement and training efficiency.
Author: Wenbo Li, Hongxia Jin
Published: Asian Conference on Computer Vision (ACCV)
Date: Dec 4, 2022