Outputs of our weakly-supervised panoptic segmentation network, trained on only bounding boxes and image-level tags

Weakly- and Semi-Supervised Panoptic Segmentation

Qizhu Li*, Anurag Arnab*, Philip H.S. Torr

Abstract

We present a weakly supervised model that jointly performs both semantic- and instance-segmentation – a particularly relevant problem given the substantial cost of obtaining pixel-perfect annotation for these tasks. In contrast to many popular instance segmentation approaches based on object detectors, our method does not predict any overlapping instances. Moreover, we are able to segment both “thing” and “stuff” classes, and thus explain all the pixels in the image. “Thing” classes are weakly-supervised with bounding boxes, and “stuff” with image-level tags. We obtain state-of-the-art results on Pascal VOC, for both full and weak supervision (which achieves about 95% of fully-supervised performance). Furthermore, we present the first weakly-supervised results on Cityscapes for both semantic- and instance-segmentation. Finally, we use our weakly supervised framework to analyse the relationship between annotation quality and predictive performance, which is of interest to dataset creators.

Type

Conference paper

Publication

In European Conference on Computer Vision

Date

September, 2018

Links

Preprint PDF Code Poster Appendix

News

We have updated our arxiv version to now report the IoUs of our fully supervised PSPNet-initialised model in Table 5 (reproduced below).
Our poster that appeared in ECCV 2018 is now available online!
Please check out our released code here!

Results

Below are the results of our panoptic segmentation results on Cityscapes in comparison to other non-overlapping instance segmentation methods. For more details, please refer to the paper.

	Validation									Test
	AP_vol^r			PQ			IoU			AP_vol^r
Method	th.	st.	all	th.	st.	all	th.	st.	all	th.
Ours (weak, ImageNet init.)	17.0	33.1	26.3	35.8	43.9	40.5	68.2	60.2	63.6	12.8
Ours (full, ImageNet init.)	24.3	42.6	34.9	39.6	52.9	47.3	70.4	72.4	71.6	18.8
Ours (full, PSPNet init.)	28.6	52.6	42.5	42.5	62.1	53.8	80.1	79.5	79.8	23.4
Pixel Encoding	9.9	–	–	–	–	–	–	–	–	8.9
RecAttend	–	–	–	–	–	–	–	–	–	9.5
InstanceCut	–	–	–	–	–	–	–	–	–	13.0
DWT	21.2	–	–	–	–	–	–	–	–	19.4
SGN	29.2	–	–	–	–	–	–	–	–	25.0

th.: thing
st.: stuff
all: mean over all thing and stuff classes

Downloads

Image crops and tags for training multi-class classifier:
- Images
  - train (9.3GB): Dropbox or BaiduYun
  - train_extra (63.3GB): Dropbox or BaiduYun
  - val (1.6GB): Dropbox or BaiduYun
- Ground truth tags
  - train+train_extra+val (90.9MB): Dropbox or BaiduYun
- Lists
  - train+train_extra+val (827kB): Dropbox or BaiduYun
- Semantic labels (provided for convenience; not to be used in training)
  - train (87.8MB): Dropbox or BaiduYun
  - train_extra (608MB): Dropbox or BaiduYun
  - val (16.2MB): Dropbox or BaiduYun
CAMs:
- train+train_extra (682MB): Dropbox or BaiduYun
Extracted Cityscapes bounding boxes (.mat format):
- train+val (7.6GB): Dropbox or BaiduYun
- train_extra (44.2GB): Dropbox or BaiduYun
Merged MCG&Grabcut masks:
- train+train_extra (99.8MB): Dropbox or BaiduYun
CAMs merged with MCG&Grabcut masks:
- train+train_extra (764MB): Dropbox or BaiduYun

Note that due to file size limit set by BaiduYun, some of the larger files had to be split into several chunks in order to be uploaded. These files are named as filename.zip.part##, where filename is the original file name excluding the extension, and ## is a two digit part index. After you have downloaded all the parts, cd to the folder where they are saved, and use the following command to join them back together:

cat filename.zip.part* > filename.zip

The joining operation may take several minutes, depending on file size.

The above does not apply to files downloaded from Dropbox.

^{_{* Equal first authorship}}