Skip to main content
Log in

Poselet-Based Contextual Rescoring for Human Pose Estimation via Pictorial Structures

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

In this paper we propose a contextual rescoring method for predicting the position of body parts in a human pose estimation framework. A set of poselets is incorporated in the model, and their detections are used to extract spatial and score-related features relative to other body part hypotheses. A method is proposed for the automatic discovery of a compact subset of poselets that covers the different poses in a set of validation images while maximizing precision. A rescoring mechanism is defined as a set-based boosting classifier that computes a new score for each body joint detection, given its relationship to detections of other body joints and mid-level parts in the image. This new score is incorporated in the pictorial structure model as an additional unary potential, following the recent work of Pishchulin et al. Experiments on two benchmarks show comparable results to Pishchulin et al. while reducing the size of the mid-level representation by an order of magnitude, reducing the execution time by \(68~\%\) accordingly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: People detection and articulated pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1014–1021).

  • Bourdev, L., & Malik, J. (2009). Poselets: Body part detectors trained using 3d human pose annotations. In: IEEE 12th international conference on computer vision (pp. 1365–1372).

  • Bourdev, L., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. In K. Daniilidis, P. Maragos, & N. Paragios (Eds.), ECCV 2010 Lecture notes in computer science (vol. 6316, pp. 168–181) Berlin: Springer.

  • Chen, X., & Yuille, A. (2014). Articulated pose estimation with image-dependent preference on pairwise relations. NIPS.

  • Cinbis, R., & Sclaroff, S. (2012). Contextual object detection using set-based classification. In A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, & C. Schmid (Eds.), ECCV 2012, Lecture notes in computer science (vol. 7577, pp. 43–57) Berlin: Springer.

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition (CVPR) (vol. 1, pp. 886–893).

  • Duan, K., Batra, D., & Crandall, D. (2012). A multi-layer composite model for human pose estimation. In: Proceedings of the british machine vision conference. BMVA press (pp. 116.1–116.11).

  • Eichner, M., & Ferrari, V. (2012). Appearance sharing for collective human pose estimation. In K. Lee, Y. Matsushita, J. Rehg, & Z. Hu (Eds.), ACCV 2012, Lecture notes in computer science (vol. 7724, pp. 138–151). Berlin: Springer.

  • Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.

    Article  Google Scholar 

  • Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2008a) Progressive search space reduction for human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8)

  • Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2008b). Progressive search space reduction for human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).

  • Hernández-Vela, A., Sclaroff, S., & Escalera, S. (2014). Contextual rescoring for human pose estimation. In: Proceedings of the british machine vision conference (to be published).

  • Johnson, S., & Everingham, M. (2010). Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the british machine vision conference. BMVA press (pp. 12.1–12.11).

  • Johnson, S., & Everingham, M. (2011). Learning effective human pose estimation from inaccurate annotation. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1465–1472).

  • Pishchulin, L., Andriluka, M., Gehler, P., & Schiele, B. (2013a). Poselet conditioned pictorial structures. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 588–595).

  • Pishchulin, L., Andriluka, M., Gehler, P., & Schiele, B. (2013b). Strong appearance and expressive spatial models for human pose estimation. In: IEEE international conference on computer vision (ICCV) (pp. 3487–3494).

  • Puertas, E., Bautista, M.A., Sanchez, D., Escalera, S., & Pujol, O. (2014). Learning to segment humans by stacking their body parts. In: ECCV 2014 workshops (In press).

  • Ramakrishna, V., Munoz, D., Hebert, M., Bagnell, J.A., & Sheikh, Y. (2014). Pose machines: Articulated pose estimation via inference machines. In: Computer vision—ECCV 2014. Springer (pp. 33–47).

  • Ramanan, D. (2007). Learning to parse images of articulated bodies. In: B. Schölkopf, J. Platt, T. Hoffman (Eds.), Advances in neural information processing systems 19, MIT press (pp. 1129–1136).

  • Sapp, B., Jordan, C., & Taskar, B. (2010). Adaptive pose priors for pictorial structures. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 422–429).

  • Sun, M., Telaprolu, M., Lee, H., & Savarese, S. (2012). An efficient branch-and-bound algorithm for optimal human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1616–1623).

  • Tian, T.P, & Sclaroff, S. (2010) Fast globally optimal 2d human detection with loopy graph models. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 81–88).

  • Tian, Y., Zitnick, C., & Narasimhan, S. (2012). Exploring the spatial hierarchy of mixture models for human pose estimation. In A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, & C. Schmid (Eds.), ECCV 2012, Lecture notes in computer science (vol. 7576, pp. 256–269) Berlin: Springer.

  • Tompson, J.J., Jain, A., LeCun, Y., & Bregler, C. (2014). Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems (pp. 1799–1807).

  • Toshev, A., & Szegedy, C. (2014). Deeppose: Human pose estimation via deep neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1653–1660).

  • Wang, F., & Li, Y. (2013). Beyond physical connections: Tree models in human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 596–603).

  • Wang, Y., Tran, D., & Liao, Z. (2011). Learning hierarchical poselets for human parsing. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1705–1712).

  • Yang, Y., & Ramanan, D. (2013). Articulated human detection with flexible mixtures of parts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12), 2878–2890.

    Article  Google Scholar 

  • Yao, B., & Fei-Fei, L. (2010). Modeling mutual context of object and human pose in human-object interaction activities. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 17–24).

Download references

Acknowledgments

This work has been partly funded by grants #1029430 and #0910908 from the US National Science Foundation, and research project ref. TIN2013-43478-P from the Spanish Ministry of Economy and Competitiveness. The work of Antonio is supported by an FPU fellowship from the Spanish government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Hernández-Vela.

Additional information

Communicated by Deva Ramanan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hernández-Vela, A., Sclaroff, S. & Escalera, S. Poselet-Based Contextual Rescoring for Human Pose Estimation via Pictorial Structures. Int J Comput Vis 118, 49–64 (2016). https://doi.org/10.1007/s11263-015-0869-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-015-0869-y

Keywords

Navigation