Why self-supervised learning or learning from extra data?
Initial work focused on Pretext or Proxy tasks. Recent works have shifted to contrastive learning.
Qizhe Xie, Minh-Thang Luong, Eduard Hovy, Quoc V. Le
Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko
Objective:
Moving away from negative samples in contrastive learning
Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton
Contributions and findings:
Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin D. Cubuk, Quoc V. Le
For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data. Self-training, on the other hand, shows positive improvements from +1.3 to +3.4AP across all dataset sizes.
Self-training helps in high data/strong augmentation regimes, even when pre-training hurts
Self-training works across dataset sizes and is additive to pre-training.
Self-supervised pre-training also hurts when self-training helps in high data/strong augmentation regimes
An intuition for the weak performance of pre-training is that pre-training is not aware of the task of interest and can fail to adapt. Such adaptation is often needed when switching tasks because, for example, good features for ImageNet may discard positional information which is needed for COCO.
Conclusion:
A. Emin Orhan, Vaibhav V. Gupta, Brenden M. Lake
SKYCam video dataset collected by mounting cameras on children as they are growing.
3 SSL techniques:
Learning from contextual cues as well since the activation maps are larger:
Limitations: