Skip to main content
Log in

Harmony Potentials

Fusing Global and Local Scale for Semantic Image Segmentation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

The Hierarchical Conditional Random Field (HCRF) model have been successfully applied to a number of image labeling problems, including image segmentation. However, existing HCRF models of image segmentation do not allow multiple classes to be assigned to a single region, which limits their ability to incorporate contextual information across multiple scales. At higher scales in the image, this representation yields an oversimplified model since multiple classes can be reasonably expected to appear within large regions. This simplified model particularly limits the impact of information at higher scales. Since class-label information at these scales is usually more reliable than at lower, noisier scales, neglecting this information is undesirable. To address these issues, we propose a new consistency potential for image labeling problems, which we call the harmony potential. It can encode any possible combination of labels, penalizing only unlikely combinations of classes. We also propose an effective sampling strategy over this expanded label set that renders tractable the underlying optimization problem. Our approach obtains state-of-the-art results on two challenging, standard benchmark datasets for semantic image segmentation: PASCAL VOC 2010, and MSRC-21.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adelson, E. H. (2001). On seeing stuff: the perception of materials by humans and machines. In Proceedings of the SPIE: human vision and electronic imaging VI.

    Google Scholar 

  • Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9), 1124–1137.

    Article  Google Scholar 

  • Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.

    Article  Google Scholar 

  • Carreira, J., & Sminchisescu, C. (2010). Constrained parametric min-cuts for automatic object segmentation. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.

    Article  Google Scholar 

  • Coughlan, J. M., & Ferreira, S. J. (2002). Finding deformable shapes using loopy belief propagation. In Proc. European conf. on computer vision.

    Google Scholar 

  • Csurka, G., & Perronnin, F. (2010). An efficient approach to semantic segmentation. International Journal of Computer Vision doi:10.1007/s11263-010-0344-8.

  • Delong, A., Osokin, A., Isack, H. N., & Boykov, Y. (2010). Fast approximate energy minimization with label costs. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.

    Article  Google Scholar 

  • Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.

    Article  Google Scholar 

  • Freeman, W. T., Pasztor, E. C., & Carmichael, O. T. (2000). Learning low-level vision. International Journal of Computer Vision, 40(1), 25–47.

    Article  MATH  Google Scholar 

  • Frey, B., & MacKay, D. (1998). A revolution: belief propagation in graphs with cycles. In Advances in neural information processing systems.

    Google Scholar 

  • Fulkerson, B., Vedaldi, A., & Soatto, S. (2009). Class segmentation and object localization with superpixel neighborhoods. In Proc. IEEE int. conf. on computer vision.

    Google Scholar 

  • Galleguillos, C., & Belongie, S. (2010). Context based object categorization: a critical survey. Computer Vision and Image Understanding, 114, 712–722.

    Article  Google Scholar 

  • Gonfaus, J., Boix, X., van de Weijer, J., Bagdanov, A., Serrat, J., & Gonzàlez, J. (2010). Harmony potentials for joint classification and segmentation. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Gould, S., Gao, T., & Koller, D. (2009). Region-based segmentation and object detection. In Advances in neural information processing systems.

    Google Scholar 

  • Hammersley, J. M., & Clifford, P. (1971). Markov fields on finite graphs and lattices. Unpublished.

  • Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. International Journal of Computer Vision, 75(1), 151–172.

    Article  Google Scholar 

  • Hoiem, D., Efros, A. A., & Hebert, M. (2008). Putting objects in perspective. International Journal of Computer Vision 80(1), 3–15.

    Article  Google Scholar 

  • Ihler, A., & McAllester, D. (2009). Particle belief propagation. In Proc. int. conf. on artificial intelligence and statistics.

    Google Scholar 

  • Ishikawa, H. (2009). Higher-order clique reduction in binary graph cut. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Jain, A., Gupta, A., & Davis, L. (2010). Learning what and how of contextual models for scene labeling. In Proc. European conf. on computer vision.

    Google Scholar 

  • Jiang, J., & Tu, Z. (2009). Efficient scale space auto-context for image segmentation and labeling. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Kohli, P., & Kumar, M. P. (2010). Energy minimization for linear envelope MRFs. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Kohli, P., Kumar, M. P., & Torr, P. H. (2009a). P3 and beyond: move making algorithms for solving higher order functions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(9), 1645–1656.

    Article  Google Scholar 

  • Kohli, P., Ladický, L., & Torr, P. H. (2009b). Robust higher order potentials for enforcing label consistency. International Journal of Computer Vision, 82(3), 302–324.

    Article  Google Scholar 

  • Koller, D., Lerner, U., & Angelov, D. (1999). A general algorithm for approximate inference and its application to hybrid Bayes nets. In Proc. annual conference on uncertainty in artificial intelligence.

    Google Scholar 

  • Kumar, M. P., Torr, P., & Zisserman, A. (2005). Obj cut. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Kumar, S., & Hebert, M. (2005). A hierarchical field framework for unified context-based classification. In Proc. IEEE int. conf. on computer vision.

    Google Scholar 

  • Ladicky, L., Russell, C., Kohli, P., & Torr, P. (2009). Associative hierarchical crfs for object class image segmentation. In Proc. IEEE int. conf. on computer vision.

    Google Scholar 

  • Ladicky, L., Russell, C., Kohli, P., & Torr, P. H. S. (2010a). Graph cut based inference with co-occurrence statistics. In Proc. European conf. on computer vision.

    Google Scholar 

  • Ladicky, L., Sturgess, P., Alahari, K., Russell, C., & Torr, P. H. S. (2010b). What, where & how many? combining object detectors and crfs. In Proc. European conf. on computer vision.

    Google Scholar 

  • Lauritzen, S. L. (1996). Graphical models. Oxford statistical science series. London: Oxford University Press.

    Google Scholar 

  • Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Lee, Y., & Grauman, K. (2010). Object-graphs for context-aware category discovery. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77(1–3), 259–289.

    Article  Google Scholar 

  • Lempitsky, V., Kohli, P., Rother, C., & Sharp, T. (2009). Image segmentation with a bounding box prior. In Proc. IEEE int. conf. on computer vision.

    Google Scholar 

  • Levin, A., & Weiss, Y. (2009). Learning to combine bottom-up and top-down segmentation. International Journal of Computer Vision, 81(1), 1645–1656.

    Article  Google Scholar 

  • Li, F., Carreira, J., & Sminchisescu, C. (2010). Object recognition as ranking holistic figure-ground hypotheses. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Li, Y., & Huttenlocher, D. P. (2008). Sparse long-range random field and its application to image denoising. In Proc. European conf. on computer vision.

    Google Scholar 

  • Lim, J. J., Arbelaez, P., Gu, C., & Malik, J. (2009). Context by region ancestry. In Proc. IEEE int. conf. on computer vision.

    Google Scholar 

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Maji, S., Berg, A. C., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Marr, D. (1982). Vision: a computational investigation into the human representation and processing of visual information. San Francisco: Freeman.

    Google Scholar 

  • Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. IEEE int. conf. on computer vision.

    Google Scholar 

  • Martin, D. R., Fowlkes, C. C., & Malik, J. (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(5), 530–549.

    Article  Google Scholar 

  • Mori, G., Ren, X., Efros, A. A., & Malik, J. (2004). Recovering human body configurations: combining segmentation and recognition. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Munoz, D., Bagnell, J. A., Vandapel, N., & Hebert, M. (2009). Contextual classification with functional max-margin Markov networks. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Munoz, D., Bagnell, J. A., & Hebert, M. (2010). Stacked hierarchical labeling. In Proc. European conf. on computer vision.

    Google Scholar 

  • Nowak, E., Jurie, F., & Triggs, B. (2006). Sampling strategies for bag-of-features image classification. In Proc. European conf. on computer vision.

    Google Scholar 

  • Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.

    Article  Google Scholar 

  • Oliva, A., & Torralba, A. (2007). The role of context in object recognition. Trends in Cognitive Sciences, 11(12), 520–527.

    Article  Google Scholar 

  • Pantofaru, C., Schmid, C., & Hebert, M. (2008). Object recognition by integrating multiple image segmentations. In Proc. European conf. on computer vision.

    Google Scholar 

  • Plath, N., Toussaint, M., & Nakajima, S. (2009). Multi-class image segmentation using conditional random fields and global classification. In Proc. international conference on machine learning.

    Google Scholar 

  • Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in large margin classifiers.

    Google Scholar 

  • Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., & Belongie, S. (2007). Objects in context. In Proc. IEEE int. conf. on computer vision.

    Google Scholar 

  • Ramalingam, S., Kohli, P., Alahari, K., & Torr, P. H. S. (2008). Exact inference in multi-label crfs with higher order cliques. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Roth, S., & Black, M. J. (2009). Fields of experts. International Journal of Computer Vision, 82(2), 205–229.

    Article  Google Scholar 

  • Rother, C., Kohli, P., Feng, W., & Jia, J. (2009). Minimizing sparse higher order energy functions of discrete variables. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Russell, C., Ladicky, L., Kohli, P., & Torr, P. H. (2010). Exact and approximate inference in associative hierarchical random fields using graph-cuts. In Proc. annual conference on uncertainty in artificial intelligence.

    Google Scholar 

  • van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(10), 1582–1596.

    Article  Google Scholar 

  • Schmid, C., & Mohr, R. (1997). Local greyvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 530–535.

    Article  Google Scholar 

  • Shahbaz Khan, F., van de Weijer, J., & Vanrell, M. (2009). Top-down color attention for object recognition. In Proc. IEEE int. conf. on computer vision.

    Google Scholar 

  • Shechtman, E., & Irani, M. (2007). Matching local self-similarities across images and videos. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2009). Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81(1), 2–23.

    Article  Google Scholar 

  • Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In Proc. IEEE int. conf. on computer vision.

    Google Scholar 

  • Sudderth, E. B., Ihler, A. T., Ihler, E. T., Freeman, W. T., & Willsky, A. S. (2002). Nonparametric belief propagation. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Tu, Z., & Zhu, S. C. (2002). Image segmentation by data-driven Markov chain Monte Carlo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 657–673.

    Article  Google Scholar 

  • Tu, Z., Chen, X., Yuille, AL, & Zhu, S. C. (2005). Image parsing: unifying segmentation, detection, and recognition. International Journal of Computer Vision, 63(2), 18–25.

    Article  Google Scholar 

  • Vazquez, E., Baldrich, R., van de Weijer, J., & Vanrell, M. (2011). Describing reflectances for colour segmentation robust to shadows, highlights and textures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 917–930.

    Article  Google Scholar 

  • Vedaldi, A., & Soatto, S. (2008). Quick shift and kernel methods for mode seeking. In Proc. European conf. on computer vision.

    Google Scholar 

  • Verbeek, J., & Triggs, B. (2008). Scene segmentation with crfs learned from partially labeled images. In Advances in neural information processing systems.

    Google Scholar 

  • Wainwright, M. J., & Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Hanover: Now Publishers Inc.

    Google Scholar 

  • van de Weijer, J., Schmid, C., Verbeek, J., & Larlus, D. (2009). Learning color names for real-world applications. IEEE Transactions on Image Processing, 18(7), 1512–1523.

    Article  MathSciNet  Google Scholar 

  • Winn, J., & Jojic, N. (2005). Locus: learning object classes with unsupervised segmentation. In Proc. IEEE int. conf. on computer vision.

    Google Scholar 

  • Woodford, O., Torr, P. H., Reid, I., & Fitzgibbon, A. (2009). Global stereo reconstruction under second-order smoothness priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2115–2128.

    Article  Google Scholar 

  • Yang, J., Yuz, K., Gongz, Y., & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Yang, L., Meer, P., & Foran, D. J. (2007). Multiple class segmentation using a unified framework over mean-shift patches. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Yang, Y., Hallman, S., Ramanan, D., & Fowlkes, C. (2010). Layered object detection for multi-class segmentation. In Proc. computer vision and pattern recognition.

    Google Scholar 

  • Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238.

    Article  Google Scholar 

  • Zhu, L., Chen, Y., Lin, Y., Lin, C., & Yuille, A. L. (2008). Recursive segmentation and recognition templates for 2D parsing. In Advances in neural information processing systems.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xavier Boix.

Additional information

Both authors contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boix, X., Gonfaus, J.M., van de Weijer, J. et al. Harmony Potentials. Int J Comput Vis 96, 83–102 (2012). https://doi.org/10.1007/s11263-011-0449-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-011-0449-8

Keywords

Navigation