Harmony Potentials

Boix, Xavier; Gonfaus, Josep M.; van de Weijer, Joost; Bagdanov, Andrew D.; Serrat, Joan; Gonzàlez, Jordi

doi:10.1007/s11263-011-0449-8

Harmony Potentials

Fusing Global and Local Scale for Semantic Image Segmentation

Published: 23 April 2011

Volume 96, pages 83–102, (2012)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Xavier Boix^1,3,
Josep M. Gonfaus^1,2,
Joost van de Weijer^1,2,
Andrew D. Bagdanov¹,
Joan Serrat^1,2 &
…
Jordi Gonzàlez^1,2

828 Accesses
79 Citations
Explore all metrics

Abstract

The Hierarchical Conditional Random Field (HCRF) model have been successfully applied to a number of image labeling problems, including image segmentation. However, existing HCRF models of image segmentation do not allow multiple classes to be assigned to a single region, which limits their ability to incorporate contextual information across multiple scales. At higher scales in the image, this representation yields an oversimplified model since multiple classes can be reasonably expected to appear within large regions. This simplified model particularly limits the impact of information at higher scales. Since class-label information at these scales is usually more reliable than at lower, noisier scales, neglecting this information is undesirable. To address these issues, we propose a new consistency potential for image labeling problems, which we call the harmony potential. It can encode any possible combination of labels, penalizing only unlikely combinations of classes. We also propose an effective sampling strategy over this expanded label set that renders tractable the underlying optimization problem. Our approach obtains state-of-the-art results on two challenging, standard benchmark datasets for semantic image segmentation: PASCAL VOC 2010, and MSRC-21.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Adelson, E. H. (2001). On seeing stuff: the perception of materials by humans and machines. In Proceedings of the SPIE: human vision and electronic imaging VI.
Google Scholar
Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9), 1124–1137.
Article Google Scholar
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.
Article Google Scholar
Carreira, J., & Sminchisescu, C. (2010). Constrained parametric min-cuts for automatic object segmentation. In Proc. computer vision and pattern recognition.
Google Scholar
Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.
Article Google Scholar
Coughlan, J. M., & Ferreira, S. J. (2002). Finding deformable shapes using loopy belief propagation. In Proc. European conf. on computer vision.
Google Scholar
Csurka, G., & Perronnin, F. (2010). An efficient approach to semantic segmentation. International Journal of Computer Vision doi:10.1007/s11263-010-0344-8.
Delong, A., Osokin, A., Isack, H. N., & Boykov, Y. (2010). Fast approximate energy minimization with label costs. In Proc. computer vision and pattern recognition.
Google Scholar
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.
Article Google Scholar
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Article Google Scholar
Freeman, W. T., Pasztor, E. C., & Carmichael, O. T. (2000). Learning low-level vision. International Journal of Computer Vision, 40(1), 25–47.
Article MATH Google Scholar
Frey, B., & MacKay, D. (1998). A revolution: belief propagation in graphs with cycles. In Advances in neural information processing systems.
Google Scholar
Fulkerson, B., Vedaldi, A., & Soatto, S. (2009). Class segmentation and object localization with superpixel neighborhoods. In Proc. IEEE int. conf. on computer vision.
Google Scholar
Galleguillos, C., & Belongie, S. (2010). Context based object categorization: a critical survey. Computer Vision and Image Understanding, 114, 712–722.
Article Google Scholar
Gonfaus, J., Boix, X., van de Weijer, J., Bagdanov, A., Serrat, J., & Gonzàlez, J. (2010). Harmony potentials for joint classification and segmentation. In Proc. computer vision and pattern recognition.
Google Scholar
Gould, S., Gao, T., & Koller, D. (2009). Region-based segmentation and object detection. In Advances in neural information processing systems.
Google Scholar
Hammersley, J. M., & Clifford, P. (1971). Markov fields on finite graphs and lattices. Unpublished.
Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. International Journal of Computer Vision, 75(1), 151–172.
Article Google Scholar
Hoiem, D., Efros, A. A., & Hebert, M. (2008). Putting objects in perspective. International Journal of Computer Vision 80(1), 3–15.
Article Google Scholar
Ihler, A., & McAllester, D. (2009). Particle belief propagation. In Proc. int. conf. on artificial intelligence and statistics.
Google Scholar
Ishikawa, H. (2009). Higher-order clique reduction in binary graph cut. In Proc. computer vision and pattern recognition.
Google Scholar
Jain, A., Gupta, A., & Davis, L. (2010). Learning what and how of contextual models for scene labeling. In Proc. European conf. on computer vision.
Google Scholar
Jiang, J., & Tu, Z. (2009). Efficient scale space auto-context for image segmentation and labeling. In Proc. computer vision and pattern recognition.
Google Scholar
Kohli, P., & Kumar, M. P. (2010). Energy minimization for linear envelope MRFs. In Proc. computer vision and pattern recognition.
Google Scholar
Kohli, P., Kumar, M. P., & Torr, P. H. (2009a). P³ and beyond: move making algorithms for solving higher order functions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(9), 1645–1656.
Article Google Scholar
Kohli, P., Ladický, L., & Torr, P. H. (2009b). Robust higher order potentials for enforcing label consistency. International Journal of Computer Vision, 82(3), 302–324.
Article Google Scholar
Koller, D., Lerner, U., & Angelov, D. (1999). A general algorithm for approximate inference and its application to hybrid Bayes nets. In Proc. annual conference on uncertainty in artificial intelligence.
Google Scholar
Kumar, M. P., Torr, P., & Zisserman, A. (2005). Obj cut. In Proc. computer vision and pattern recognition.
Google Scholar
Kumar, S., & Hebert, M. (2005). A hierarchical field framework for unified context-based classification. In Proc. IEEE int. conf. on computer vision.
Google Scholar
Ladicky, L., Russell, C., Kohli, P., & Torr, P. (2009). Associative hierarchical crfs for object class image segmentation. In Proc. IEEE int. conf. on computer vision.
Google Scholar
Ladicky, L., Russell, C., Kohli, P., & Torr, P. H. S. (2010a). Graph cut based inference with co-occurrence statistics. In Proc. European conf. on computer vision.
Google Scholar
Ladicky, L., Sturgess, P., Alahari, K., Russell, C., & Torr, P. H. S. (2010b). What, where & how many? combining object detectors and crfs. In Proc. European conf. on computer vision.
Google Scholar
Lauritzen, S. L. (1996). Graphical models. Oxford statistical science series. London: Oxford University Press.
Google Scholar
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In Proc. computer vision and pattern recognition.
Google Scholar
Lee, Y., & Grauman, K. (2010). Object-graphs for context-aware category discovery. In Proc. computer vision and pattern recognition.
Google Scholar
Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77(1–3), 259–289.
Article Google Scholar
Lempitsky, V., Kohli, P., Rother, C., & Sharp, T. (2009). Image segmentation with a bounding box prior. In Proc. IEEE int. conf. on computer vision.
Google Scholar
Levin, A., & Weiss, Y. (2009). Learning to combine bottom-up and top-down segmentation. International Journal of Computer Vision, 81(1), 1645–1656.
Article Google Scholar
Li, F., Carreira, J., & Sminchisescu, C. (2010). Object recognition as ranking holistic figure-ground hypotheses. In Proc. computer vision and pattern recognition.
Google Scholar
Li, Y., & Huttenlocher, D. P. (2008). Sparse long-range random field and its application to image denoising. In Proc. European conf. on computer vision.
Google Scholar
Lim, J. J., Arbelaez, P., Gu, C., & Malik, J. (2009). Context by region ancestry. In Proc. IEEE int. conf. on computer vision.
Google Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Maji, S., Berg, A. C., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. In Proc. computer vision and pattern recognition.
Google Scholar
Marr, D. (1982). Vision: a computational investigation into the human representation and processing of visual information. San Francisco: Freeman.
Google Scholar
Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. IEEE int. conf. on computer vision.
Google Scholar
Martin, D. R., Fowlkes, C. C., & Malik, J. (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(5), 530–549.
Article Google Scholar
Mori, G., Ren, X., Efros, A. A., & Malik, J. (2004). Recovering human body configurations: combining segmentation and recognition. In Proc. computer vision and pattern recognition.
Google Scholar
Munoz, D., Bagnell, J. A., Vandapel, N., & Hebert, M. (2009). Contextual classification with functional max-margin Markov networks. In Proc. computer vision and pattern recognition.
Google Scholar
Munoz, D., Bagnell, J. A., & Hebert, M. (2010). Stacked hierarchical labeling. In Proc. European conf. on computer vision.
Google Scholar
Nowak, E., Jurie, F., & Triggs, B. (2006). Sampling strategies for bag-of-features image classification. In Proc. European conf. on computer vision.
Google Scholar
Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.
Article Google Scholar
Oliva, A., & Torralba, A. (2007). The role of context in object recognition. Trends in Cognitive Sciences, 11(12), 520–527.
Article Google Scholar
Pantofaru, C., Schmid, C., & Hebert, M. (2008). Object recognition by integrating multiple image segmentations. In Proc. European conf. on computer vision.
Google Scholar
Plath, N., Toussaint, M., & Nakajima, S. (2009). Multi-class image segmentation using conditional random fields and global classification. In Proc. international conference on machine learning.
Google Scholar
Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in large margin classifiers.
Google Scholar
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., & Belongie, S. (2007). Objects in context. In Proc. IEEE int. conf. on computer vision.
Google Scholar
Ramalingam, S., Kohli, P., Alahari, K., & Torr, P. H. S. (2008). Exact inference in multi-label crfs with higher order cliques. In Proc. computer vision and pattern recognition.
Google Scholar
Roth, S., & Black, M. J. (2009). Fields of experts. International Journal of Computer Vision, 82(2), 205–229.
Article Google Scholar
Rother, C., Kohli, P., Feng, W., & Jia, J. (2009). Minimizing sparse higher order energy functions of discrete variables. In Proc. computer vision and pattern recognition.
Google Scholar
Russell, C., Ladicky, L., Kohli, P., & Torr, P. H. (2010). Exact and approximate inference in associative hierarchical random fields using graph-cuts. In Proc. annual conference on uncertainty in artificial intelligence.
Google Scholar
van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(10), 1582–1596.
Article Google Scholar
Schmid, C., & Mohr, R. (1997). Local greyvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 530–535.
Article Google Scholar
Shahbaz Khan, F., van de Weijer, J., & Vanrell, M. (2009). Top-down color attention for object recognition. In Proc. IEEE int. conf. on computer vision.
Google Scholar
Shechtman, E., & Irani, M. (2007). Matching local self-similarities across images and videos. In Proc. computer vision and pattern recognition.
Google Scholar
Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In Proc. computer vision and pattern recognition.
Google Scholar
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2009). Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81(1), 2–23.
Article Google Scholar
Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In Proc. IEEE int. conf. on computer vision.
Google Scholar
Sudderth, E. B., Ihler, A. T., Ihler, E. T., Freeman, W. T., & Willsky, A. S. (2002). Nonparametric belief propagation. In Proc. computer vision and pattern recognition.
Google Scholar
Tu, Z., & Zhu, S. C. (2002). Image segmentation by data-driven Markov chain Monte Carlo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 657–673.
Article Google Scholar
Tu, Z., Chen, X., Yuille, AL, & Zhu, S. C. (2005). Image parsing: unifying segmentation, detection, and recognition. International Journal of Computer Vision, 63(2), 18–25.
Article Google Scholar
Vazquez, E., Baldrich, R., van de Weijer, J., & Vanrell, M. (2011). Describing reflectances for colour segmentation robust to shadows, highlights and textures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 917–930.
Article Google Scholar
Vedaldi, A., & Soatto, S. (2008). Quick shift and kernel methods for mode seeking. In Proc. European conf. on computer vision.
Google Scholar
Verbeek, J., & Triggs, B. (2008). Scene segmentation with crfs learned from partially labeled images. In Advances in neural information processing systems.
Google Scholar
Wainwright, M. J., & Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Hanover: Now Publishers Inc.
Google Scholar
van de Weijer, J., Schmid, C., Verbeek, J., & Larlus, D. (2009). Learning color names for real-world applications. IEEE Transactions on Image Processing, 18(7), 1512–1523.
Article MathSciNet Google Scholar
Winn, J., & Jojic, N. (2005). Locus: learning object classes with unsupervised segmentation. In Proc. IEEE int. conf. on computer vision.
Google Scholar
Woodford, O., Torr, P. H., Reid, I., & Fitzgibbon, A. (2009). Global stereo reconstruction under second-order smoothness priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2115–2128.
Article Google Scholar
Yang, J., Yuz, K., Gongz, Y., & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In Proc. computer vision and pattern recognition.
Google Scholar
Yang, L., Meer, P., & Foran, D. J. (2007). Multiple class segmentation using a unified framework over mean-shift patches. In Proc. computer vision and pattern recognition.
Google Scholar
Yang, Y., Hallman, S., Ramanan, D., & Fowlkes, C. (2010). Layered object detection for multi-class segmentation. In Proc. computer vision and pattern recognition.
Google Scholar
Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238.
Article Google Scholar
Zhu, L., Chen, Y., Lin, Y., Lin, C., & Yuille, A. L. (2008). Recursive segmentation and recognition templates for 2D parsing. In Advances in neural information processing systems.
Google Scholar

Download references

Author information

Authors and Affiliations

Centre de Visió per Computador, Barcelona, Spain
Xavier Boix, Josep M. Gonfaus, Joost van de Weijer, Andrew D. Bagdanov, Joan Serrat & Jordi Gonzàlez
Department of Computer Science, Universitat Autònoma de Barcelona, Barcelona, Spain
Josep M. Gonfaus, Joost van de Weijer, Joan Serrat & Jordi Gonzàlez
Computer Vision Laboratory, ETH Zurich, Zurich, Switzerland
Xavier Boix

Authors

Xavier Boix
View author publications
You can also search for this author in PubMed Google Scholar
Josep M. Gonfaus
View author publications
You can also search for this author in PubMed Google Scholar
Joost van de Weijer
View author publications
You can also search for this author in PubMed Google Scholar
Andrew D. Bagdanov
View author publications
You can also search for this author in PubMed Google Scholar
Joan Serrat
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Gonzàlez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xavier Boix.

Additional information

Both authors contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boix, X., Gonfaus, J.M., van de Weijer, J. et al. Harmony Potentials. Int J Comput Vis 96, 83–102 (2012). https://doi.org/10.1007/s11263-011-0449-8

Download citation

Received: 11 November 2010
Accepted: 04 April 2011
Published: 23 April 2011
Issue Date: January 2012
DOI: https://doi.org/10.1007/s11263-011-0449-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Harmony Potentials

Abstract

Access this article

Similar content being viewed by others

Closed-Form Approximate CRF Training for Scalable Image Segmentation

Non-parametric Higher-Order Random Fields for Image Segmentation

Asymmetric Cuts: Joint Image Labeling and Partitioning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Harmony Potentials

Abstract

Access this article

Similar content being viewed by others

Closed-Form Approximate CRF Training for Scalable Image Segmentation

Non-parametric Higher-Order Random Fields for Image Segmentation

Asymmetric Cuts: Joint Image Labeling and Partitioning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation