Skip to main content

Advertisement

Log in

Evolving weighting schemes for the Bag of Visual Words

  • Computational Intelligence for Vision and Robotics
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The Bag of Visual Words (BoVW) is an established representation in computer vision. Taking inspiration from text mining, this representation has proved to be very effective in many domains. However, in most cases, standard term-weighting schemes are adopted (e.g., term-frequency or TF-IDF). It remains open the question of whether alternative weighting schemes could boost the performance of methods based on BoVW. More importantly, it is unknown whether it is possible to automatically learn and determine effective weighting schemes from scratch. This paper brings some light into both of these unknowns. On the one hand, we report an evaluation of the most common weighting schemes used in text mining, but rarely used in computer vision tasks. Besides, we propose an evolutionary algorithm capable of automatically learning weighting schemes for computer vision problems. We report empirical results of an extensive study in several computer vision problems. Results show the usefulness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. One should note the text mining community has proposed variants that aim to soften such assumptions, e.g., using n-grams [2], still the BoW is very competitive with such formulations.

  2. Please note that traditional weighting schemes have been proposed by researchers based on their own experiences and biases, making strong assumptions and relying on intuition.

  3. Please note that in GP, for each individual, either mutation or crossover is performed each time, but not both. This is different from other variants like genetic algorithms.

  4. Matlab files with the predefined partitions are publicly available under request.

  5. PHOW is an extension to the raw BoVW formulation that aims at incorporating spatial information by means of a pyramidal structure, see [3] for details.

  6. Please note that estimating the fitness function is quite efficient, as it is based on a fast approximation to a linear SVM. So this method can be used for most computer vision applications. Also, we emphasize that the fitness function is only estimated during the learning process, which has to be done a single time and most of the times is performed offline.

References

  1. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Boston

    Google Scholar 

  2. Bekkerman R, Allan J (2004) Using bigrams in text categorization. Technical Report, Department of Computer Science. University of Massachusetts, Amherst, vol 1003, pp 1–2

  3. Bosch A, Zisserman A, Munoz X (2007) Image classification using random forests and ferns. In: Proceedings of the ICCV

  4. Chang KW, Roth D (2011) Selective block minimization for faster convergence of limited memory large-scale linear models. In: SIGKDD conference on knowledge discovery and data mining. ACM

  5. Csurka G, Dance CR, Fan L, Willamowski J, Bra C (2004) Visual categorization with bags of keypoints. In: International workshop on statistical learning in computer vision

  6. Cummins R, O’Riordan C (2006) Evolving local and global weighting schemes in information retrieval. Inf Retr 9:311–330

    Article  Google Scholar 

  7. Debole F, Sebastiani F (2003) Supervised term-weighting for automated text categorization. In: Proceedings of the 2003 ACM symposium on applied computing, SAC ’03. ACM, New York, pp 784–788

  8. Demsar J (2006) Statistical comparisons of classifiersover multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  9. Deselaers T, Pimenidis L, Ney H (2008) Bag of visual words for adult image classification and filtering. In: Proceedings of the international conference on pattern recognition. IEEE

  10. Djuric N, Lan L, Vucetic S, Wang Z (2013) Budgetedsvm: a toolbox for scalable svm approximations. J Mach Learn Res 14:3813–3817

    MathSciNet  MATH  Google Scholar 

  11. Escalante HJ, Garcia M, Morales A, Graff M, Montes M, Morales EF, Martinez J (2015) Term-weighting learning via genetic programming for text classification. Knowl Based Syst 83:176–189

    Article  Google Scholar 

  12. Escalante HJ, Martinez-Carranza J, Escalera S, Ponce-López V, Baró X (2015) Improving bag of visual words representations with genetic programming. In: Proceedings of the 2015 international joint conference on neural networks. IEEE, pp 3674–3681

  13. Escalante HJ, Montes M, Sucar E (2012) Semantic cohesion for image annotation and retrieval. Comput Sist 10(1):121–126

    Google Scholar 

  14. Escalante HJ, Sucar E, Morales E (2016) A naive bayes baseline for early gesture recognition. Pattern Recogn Lett 73:91–99

    Article  Google Scholar 

  15. Escalera S, Baro X, Gonzalez J, Bautista MA, Madadi M, Reyes M, Ponce V, Escalante HJ, Shotton J, Guyon I (2014) ChaLearn looking at people challenge 2014: dataset and results. In: Proceedings of ECCV—chalearn workshop

  16. Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Proceedings of the IEEE, CVPRW

  17. Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305

    MATH  Google Scholar 

  18. García-Limón M, Escalante HJ, Montes y Gómez M, Morales A, Morales E (2014) Towards the automated generation of term-weighting schemes for text categorization. In: Procddings of GECCO Comp’14, (Late-breaking abstract), pp 1459–1460

  19. Gonzalez-Gurrola LC, Moreno R, Escalante HJ, Martnez F, Carlos R (2015) Learning roadway surface disruption patterns using the bag of words representation. IEEE transactions on intelligent transportation systems (under review)

  20. Grauman K, Leibe B (2010) Visual object recognition. Morgan and Claypool, San Rafael

    Google Scholar 

  21. Guyon I, Athitsos V, Jangyodsuk P, Escalante HJ (2014) The Chalearn gesture dataset (CGD 2011). Mach Vis Appl 25(8):1929–1951

    Article  Google Scholar 

  22. Hernández-Vela A, Bautista MA, Perez-Sala X, Ponce-López V, Escalera S, Baró X, Pujol O, Angulo C (2014) Probability-based dynamic time warping and bag-of-visual-and-depth-words for human gesture recognition in rgb-d. Pattern Recognit Lett 50(1):112–121

    Article  Google Scholar 

  23. Hoai M, De la Torre F (2012) Max-margin early event detectors. In: IEEE conference on computer vision and pattern recognition. IEEE, Providence, RI, pp 2863–2870

  24. Hoai M, Lan Z, De la Torre F (2011) Joint segmentation and classification of human actions in video. In: IEEE conference on computer vision and pattern recognition. IEEE, Providence, RI, pp 3265–3272

  25. Huang D, Yao S, Wang Y, De La Torre F (2014) Sequential max-margin event detectors. In: European conference on computer vision

  26. Lan M, Tan CL, Su J, Lu Y (2009) Supervised and traditional term-weighting methods for automatic text categorization. Trans PAMI 31(4):721–735

    Article  Google Scholar 

  27. Langdon WB, Poli R (2001) Foundations of genetic programming. Springer, Berlin

    MATH  Google Scholar 

  28. Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2–3):107–123

    Article  Google Scholar 

  29. Lazebnik S, Schmid C, Ponce J (2004) Semi-local affine parts for object recognition. In: British machine vision conference, pp 779–788

  30. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the computer vision and image processing conference, IEEE, pp 2169–2178

  31. Lazebnik S, Schmid C, Ponce JA (2015) Maximum entropy framework for part-based texture and object recognition. In: IEEE international conference on computer vision, pp 832–838

  32. Lopez-Monroy AP, Montes y Gomez M, Escalante HJ, Cruz-Roa A, Gonzalez FA (2015) Improving the bovw with discriminative n-grams and mkl. Neurocomputing 175:768–781

    Article  Google Scholar 

  33. Luke S, Panait L (2002) Lexicographic parsimony pressure. In: Proceedings of the 2002 genetic and evolutionary computation conference, pp 829–836

  34. Manchala S, Prasad VK, Janaki V (2014) Gmm based language identification system using robust features. Int J Speech Technol 17:99–105

    Article  Google Scholar 

  35. Mirza-Mohammadi M, Escalera S, Radeva P(2009) Contextual-guided bag-of-visual-words model for multi-class object categorization. In: Proceedings of the CAIP. Springer, pp 748–756

  36. Neverova N, Wolf C, Taylor GW, Nebout F (2014) Multi-scale deep learning for gesture detection and localization. In: Proceedings of the ECCV chalearn workshop on looking at people

  37. Saffari A, Guyon I (2006) Quick start guide for clop. Technical report, TU Graz—CLOPINET

  38. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24:513–523

    Article  Google Scholar 

  39. Sebastiani F (2008) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47

    Article  Google Scholar 

  40. Sidorov G, Gelbukh A, Gomez-Adorno H, Pinto D (2014) Soft similarity and soft cosine measure: similarity of features in vector space model. Comput Sist 18(3):491–504

    Google Scholar 

  41. Silva S, Almeida J (2003) Gplab-a genetic programming toolbox for matlab. In: Proceedings of the Nordic MATLAB conference, pp 273–278

  42. Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. Int Conf Comput Vis 2:1470–1477

    Google Scholar 

  43. Tirilly P, Claveau V, Gros P (2009) A review of weighting schemes for bag of visual words image retrieval. Technical report, IRISA

  44. Turney P, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37:141–188

    MathSciNet  MATH  Google Scholar 

  45. Vedaldi A, Fulkerson B (2010) VLFeat: an open and portable library of computer vision algorithms. In: Proceedings of the 18th ACM international conference on multimedia. ACM, pp 1469–1472

  46. Wang J, Liu P, She FH, Nahavandi M, Kouzani A (2013) Bag-of-words representation for biomedical time series classification. Biomed Signal Process Control 8(6):634–644

    Article  Google Scholar 

  47. Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: IEEE conference on computer vision and pattern recognition. IEEE, Providence, RI, pp 1290–1297

  48. Xia L, Aggarwal JK (2013) Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: IEEE conference on computer vision and pattern recognition. IEEE, Portland, OR, pp 2834–2841

  49. Yoo SJ (2004) Intelligent multimedia information retrieval for identifying and rating adult images. In: Proceedings of the international conference KES, vol 3213 of LNAI, pp 164–170. Springer

  50. Zhang J, Marszablek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73(2):213–238

    Article  Google Scholar 

  51. Zhang K, Lan L, Wang Z, Moerchen F (2012) Scaling up kernel svm on limited resources: A low-rank linearization approach. In: Proceedings of th AISTATS 2012

Download references

Acknowledgments

This work was supported by CONACyT under Project Grant No. CB-2014-241306 (Clasificación y recuperación de imágenes mediante técnicas de minería de textos) and Spanish Ministry of Economy and Competitiveness TIN2013-43478-P. Víctor Ponce-López is supported by Fellowship No. 2013FI-B01037 and Project TIN2012-38187-C03-02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hugo Jair Escalante.

Additional information

This paper is an extended and improved version of [12] and it is being submitted to the Special Issue on Computational Intelligence for Vision and Robotics of the Neural Computing and Applications Journal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Escalante, H.J., Ponce-López, V., Escalera, S. et al. Evolving weighting schemes for the Bag of Visual Words. Neural Comput & Applic 28, 925–939 (2017). https://doi.org/10.1007/s00521-016-2223-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-016-2223-x

Keywords

Navigation