Our paper on multitask transformer will appear in CVPR 2022
The paper "MulT: An End-to-End Multitask Learning Transformer" by Deblina Bhattacharjee, Tong Zhang, Sabine Süsstrunk and Mathieu Salzmann is accepted to CVPR2022.
IVRL member Deblina Bhattacharjee will present her work at the Computer Vision and Pattern Recognition conference taking place in New Orleans, Louisiana from June 19 to June 24, 2022.
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks, including depth estimation, semantic segmentation, reshading, surface normal estimation, 2D keypoint detection, and edge detection. Based on the Swin transformer model, our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads. At the heart of our approach is a shared attention mechanism modeling the dependencies across the tasks. We evaluate our model on several multitask benchmarks, showing that our MulT framework outperforms both the state-of-the art multitask convolutional neural network models and all the respective single task transformer models.