NeurIPS2020

NeurIPS-2020

Why self-supervised learning or learning from extra data?

Huge amount of raw image and video data available
Ability to cover wider domains and information

Initial work focused on Pretext or Proxy tasks. Recent works have shifted to contrastive learning.

Self-training with Noisy Student improves ImageNet classification

Qizhe Xie, Minh-Thang Luong, Eduard Hovy, Quoc V. Le

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-11_19-51-49.png

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-11_19-52-39.png

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko

Objective:

Moving away from negative samples in contrastive learning

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-11_20-18-31.png

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-11_20-18-58.png

Big Self-Supervised Models are Strong Semi-Supervised Learners

Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton

Contributions and findings:

Bigger Models Are More Label-Efficient
Bigger/Deeper Projection Heads Improve Representation Learning
Distillation Using Unlabeled Data Improves Semi-Supervised Learning

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-11_20-15-00.png

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-11_20-14-56.png

Rethinking Pre-training and Self-training

Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin D. Cubuk, Quoc V. Le

Stronger data augmentation and more labeled data further diminish the value of pre-training
Unlike pre-training, self-training is always helpful when using stronger data augmentation, in both low-data and high-data regimes, and
In the case that pre-training is helpful, self-training improves upon pre-training.

For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data. Self-training, on the other hand, shows positive improvements from +1.3 to +3.4AP across all dataset sizes.

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-11_20-23-08.png

Self-training helps in high data/strong augmentation regimes, even when pre-training hurts

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-12_02-42-07.png

Self-training works across dataset sizes and is additive to pre-training.

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-12_02-42-53.png

Self-supervised pre-training also hurts when self-training helps in high data/strong augmentation regimes

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-12_02-43-50.png

An intuition for the weak performance of pre-training is that pre-training is not aware of the task of interest and can fail to adapt. Such adaptation is often needed when switching tasks because, for example, good features for ImageNet may discard positional information which is needed for COCO.

Conclusion:

Self-training works well in every setup that we tried: low data regime, high data regime, weak data augmentation, and strong data augmentation.
Self-training also is effective with different architectures (ResNet, EfficientNet, SpineNet, FPN, NAS-FPN), data sources (ImageNet, OID, PASCAL, COCO) and tasks (Object Detection, Segmentation).
Secondly, in terms of generality, self-training works well even when pre-training fails but also when pre-training succeeds. In terms of scalability, self-training proves to perform well as we have more labeled data and better models.

Self-supervised learning through the eyes of a child

A. Emin Orhan, Vaibhav V. Gupta, Brenden M. Lake

SKYCam video dataset collected by mounting cameras on children as they are growing.

3 SSL techniques:

Temporal Classification:
Static-Contrastive learning - Momentum contrast
Temporal-Contrastive learning- Adjacent frames are positive

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-12_03-17-44.png

Learning from contextual cues as well since the activation maps are larger:

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-12_03-13-41.png

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-12_03-25-39.png

Limitations:

Dataset limitation to ~1 week of learning of a child due to frequency
No multi-modal learning or multi-sensory inputs to learn
Self Supervised algos use unrealistic augmentations to learn. An attempt to more intuitive self-supervised learning can be moving away from such hand-crafted or thought out augmentations.