Semantic segmentation from spherical cameras for audio-visual room understanding
Speaker: Teo de Campos associate professor at University of Brasilia, Brasilia, Brazil.
We propose a pipeline for estimating 3D room layout with object and material attribute prediction using a pair of spherical images. We use the Manhattan world assumption and automatically align two spherical images to the global world coordinate system. Depth information of the scene is estimated by stereo matching. From that, we obtain cubic projections as RGB images, depth maps and surface normals. These are fed into a Fully Convolutional Neural Network which densely labels pixels. The results are mapped into a simplified reconstruction of the scene, built using cuboid fitting. The obtained models have been used for room acoustic simulations.