NeurIPS2020

NeurIPS-2020

Why self-supervised learning or learning from extra data?

Initial work focused on Pretext or Proxy tasks. Recent works have shifted to contrastive learning.

Self-training with Noisy Student improves ImageNet classification

Qizhe Xie, Minh-Thang Luong, Eduard Hovy, Quoc V. Le

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-11_19-51-49.png

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-11_19-52-39.png

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko

Objective:

Moving away from negative samples in contrastive learning

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-11_20-18-31.png

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-11_20-18-58.png

Big Self-Supervised Models are Strong Semi-Supervised Learners

Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton

Contributions and findings:

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-11_20-15-00.png

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-11_20-14-56.png

Rethinking Pre-training and Self-training

Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin D. Cubuk, Quoc V. Le

For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data. Self-training, on the other hand, shows positive improvements from +1.3 to +3.4AP across all dataset sizes.

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-11_20-23-08.png

Self-training helps in high data/strong augmentation regimes, even when pre-training hurts

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-12_02-42-07.png

Self-training works across dataset sizes and is additive to pre-training.

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-12_02-42-53.png

Self-supervised pre-training also hurts when self-training helps in high data/strong augmentation regimes

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-12_02-43-50.png

An intuition for the weak performance of pre-training is that pre-training is not aware of the task of interest and can fail to adapt. Such adaptation is often needed when switching tasks because, for example, good features for ImageNet may discard positional information which is needed for COCO.

Conclusion:

Self-supervised learning through the eyes of a child

A. Emin Orhan, Vaibhav V. Gupta, Brenden M. Lake

SKYCam video dataset collected by mounting cameras on children as they are growing.

3 SSL techniques:

  1. Temporal Classification:
  2. Static-Contrastive learning - Momentum contrast
  3. Temporal-Contrastive learning- Adjacent frames are positive

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-12_03-17-44.png

Learning from contextual cues as well since the activation maps are larger:

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-12_03-13-41.png

NeurIPS-2020%20bf388a517513448fa51e5e4daec8f650/Screenshot_from_2021-01-12_03-25-39.png

Limitations: