Outputs of our weakly-supervised panoptic segmentation network, trained on only bounding boxes and image-level tags

Weakly- and Semi-Supervised Panoptic Segmentation


We present a weakly supervised model that jointly performs both semantic- and instance-segmentation – a particularly relevant problem given the substantial cost of obtaining pixel-perfect annotation for these tasks. In contrast to many popular instance segmentation approaches based on object detectors, our method does not predict any overlapping instances. Moreover, we are able to segment both “thing” and “stuff” classes, and thus explain all the pixels in the image. “Thing” classes are weakly-supervised with bounding boxes, and “stuff” with image-level tags. We obtain state-of-the-art results on Pascal VOC, for both full and weak supervision (which achieves about 95% of fully-supervised performance). Furthermore, we present the first weakly-supervised results on Cityscapes for both semantic- and instance-segmentation. Finally, we use our weakly supervised framework to analyse the relationship between annotation quality and predictive performance, which is of interest to dataset creators.

In European Conference on Computer Vision


  • We have updated our arxiv version to now report the IoUs of our fully supervised PSPNet-initialised model in Table 5 (reproduced below).
  • Our poster that appeared in ECCV 2018 is now available online!
  • Please check out our released code here!


Below are the results of our panoptic segmentation results on Cityscapes in comparison to other non-overlapping instance segmentation methods. For more details, please refer to the paper.

Validation Test
APvolr PQ IoU APvolr
Method th. st. all th. st. all th. st. all th.
Ours (weak, ImageNet init.) 17.0 33.1 26.3 35.8 43.9 40.5 68.2 60.2 63.6 12.8
Ours (full, ImageNet init.) 24.3 42.6 34.9 39.6 52.9 47.3 70.4 72.4 71.6 18.8
Ours (full, PSPNet init.) 28.6 52.6 42.5 42.5 62.1 53.8 80.1 79.5 79.8 23.4
Pixel Encoding 9.9 8.9
RecAttend 9.5
InstanceCut 13.0
DWT 21.2 19.4
SGN 29.2 25.0

th.: thing
st.: stuff
all: mean over all thing and stuff classes


  1. Image crops and tags for training multi-class classifier:
  2. CAMs:
  3. Extracted Cityscapes bounding boxes (.mat format):
  4. Merged MCG&Grabcut masks:
  5. CAMs merged with MCG&Grabcut masks:

Note that due to file size limit set by BaiduYun, some of the larger files had to be split into several chunks in order to be uploaded. These files are named as filename.zip.part##, where filename is the original file name excluding the extension, and ## is a two digit part index. After you have downloaded all the parts, cd to the folder where they are saved, and use the following command to join them back together:

cat filename.zip.part* > filename.zip

The joining operation may take several minutes, depending on file size.

The above does not apply to files downloaded from Dropbox.

* Equal first authorship