Tuesday, May 12th, 2015 at 2:00PM
Speaker: Iasonas Kokkinos , associate professor at Ecole Centrale Paris, Chatenay-Malabry, France
The application of graphical models, such as Deformable Part Models (DPMs) to high-level vision tasks has largely been facilitated by the adoption of loop-free graph structures, e.g. tree- or star-shaped graphs, which enable exact and efficient inference with Dynamic Programming, or even Branch-and-Bound. Still, a host of applications exhibit richer structure, that cannot be captured by loop-free graphical models. In such cases, it may be advantageous to perform approximate inference with richer models, rather than exact inference with simpler ones. In this talk we will consider two such cases, where approximate inference with loopy models delivers state-of-the-art results.
In the the first part of the talk we will see how the Alternating Direction Method of Multipliers (ADDM) can be used to accommodate loops in the model's graph structure. For this we decompose a loopy graph into a set of loop-free "slaves" and use a master-slave scheme to ensure that the slave solutions are consistent. We demonstrate substantial acceleration over Dual Decomposition, which often fails to converge. On a challenging medical shape segmentation benchmark our ADMM-based technique yields substantially better results than the previous state-of-the-art, while converging in typically less than 10 iterations.
In the second part of the talk we will cover more recent advances on integrating Deep Learning with object detection and semantic segmentation. We will start with our efforts on dealing with scale-invariance in image classification, move on to treating scale- and aspect- variation in object detection, and finally turn to fully-connected models for semantic image segmentation. The common theme in all these works is the use of fully convolutional neural networks, aimed at substituting the commonly hand-engineered pipelines as generic visual front-ends. We will demonstrate that we can obtain state-of-the-art results in all three tasks while employing fairly straightforward processing pipelines.