People
Abstract
We present an approach to utilize large amounts of web
data for learning CNNs. Specifically inspired by curriculum
learning, we present a two-step approach for CNN training.
First, we use easy images to train an initial visual representation.
We then use this initial CNN and adapt it to harder,
more realistic images by leveraging the structure of data
and categories. We demonstrate that our two-stage CNN
outperforms a fine-tuned CNN trained on ImageNet on Pascal
VOC 2012. We also demonstrate the strength of webly
supervised learning by localizing objects in web images and
training a R-CNN style [20] detector. It achieves the best
performance on VOC 2007 where no VOC training data is
used. Finally, we show our approach is quite robust to noise
and performs comparably even when we use image search
results from March 2013 (pre-CNN image search era).
Keywords
- Convolutional Networks, Region CNN
- Subcategory, Object Discovery, Weakly Supervised Object Localization
- Vision for the Web, Webly-Supervised
Paper
ICCV paper (pdf, 2.0MB)
Supplementary Material (pdf, 2.1MB)
Slides (pptx, 30MB)
Poster (pdf, 27MB)
YouTube Video
Talk
Citation
Xinlei Chen and Abhinav Gupta. Webly Supervised Learning of Convolutional Networks. In ICCV 2015.
@inproceedings{chen_iccv15,
Author = {Xinlei Chen and Abhinav Gupta},
Title = {{W}ebly {S}upervised {L}earning of {C}onvolutional {N}etworks},
Booktitle = {International Conference on Computer Vision (ICCV)},
Year = 2015,
}
Related Papers
Downloads
We provide the images and queries used in the paper.
- Google Images: all Google images used to train the networks here.
- Flickr Images: all Flickr images used to train/fine-tune the networks here.
- Google List of Objects and Attributes: list of objects and attributes downloaded from Google here.
- Google List of Objects, Attributes and Scenes: list of objects, attributes and scenes downloaded from Google here.
- Flickr List of Objects and Attributes: list of objects and attributes downloaded from Flickr (which is a subset of the Google list, some classes do not have enough Flickr images) here.
We also provide the pre-trained Caffe models in the paper (trained in April 2015, and converted to Caffe2 in March 2018).
- GoogleO: network and definition files trained only on Google object and attribute images.
- FlickrG: network and definition file fine-tuned with Flickr Images.
- GoogleA: network and definition file trained only on Google object, attribute and scene images.
Funding
This research was supported by:
- ONR N000141010934
- Yahoo!
- Nvidia GPU Donations
- Google
- XC is supported by Yahoo-InMind Fellowship
- AG is supported by Bosch Young Faculty Fellowship