Speaker: José Álvarez , researcher at NICTA, Canberra, ACT, Australia
Abstract: There is increasing interest in exploiting multiple images for scene understanding, with great progress in areas such as cosegmentation and video segmentation. Jointly analyzing the images in a large set offers the opportunity to exploit a greater source of information than when considering a single image on its own. However, this also yields challenges, since, to effectively exploit all the available information, the resulting methods need to consider not just local connections, but efficiently analyze similarity between all pairs of pixels within and across all the images. In this paper, we propose to model an image set as a fully-connected pairwise Conditional Random Field (CRF) defined over the image pixels, or superpixels, with Gaussian edge potentials. We show that this lets us co-label the images of a large set efficiently, thus yielding increased accuracy at no additional computational cost compared to sequential labeling of the images. Furthermore, we show that our model can be applied to the semi-supervised case, where we jointly consider labeled and unlabeled data in the CRF. This allows us to either entirely bypass the time-consuming computation of unary terms, or to exploit unaries computed at sparse image locations. Our experimental evaluation demonstrates that our framework lets us handle over ten thousand images in a matter of seconds.