Skip to main content
Log in

Modelling Task-Dependent Eye Guidance to Objects in Pictures

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

We introduce a model of attentional eye guidance based on the rationale that the deployment of gaze is to be considered in the context of a general action-perception loop relying on two strictly intertwined processes: sensory processing, depending on current gaze position, identifies sources of information that are most valuable under the given task; motor processing links such information with the oculomotor act by sampling the next gaze position and thus performing the gaze shift. In such a framework, the choice of where to look next is task-dependent and oriented to classes of objects embedded within pictures of complex scenes. The dependence on task is taken into account by exploiting the value and the payoff of gazing at certain image patches or proto-objects that provide a sparse representation of the scene objects. The different levels of the action-perception loop are represented in probabilistic form and eventually give rise to a stochastic process that generates the gaze sequence. This way the model also accounts for statistical properties of gaze shifts such as individual scan path variability. Results of the simulations are compared either with experimental data derived from publicly available datasets and from our own experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Anderson BA. A value-driven mechanism of attentional selection. J Vis. 2013;13(3).

  2. Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), 2010. p. 2963–2970.

  3. Bahill AT, Adler D, Stark L. Most naturally occurring human saccades have magnitudes of 15 degrees or less. Investig Ophthalmol Vis Sci. 1975;14(6):468–9.

    CAS  Google Scholar 

  4. Bartumeus F, da Luz MGE, Viswanathan G, Catalan J. Animal search strategies: a quantitative random-walk analysis. Ecology. 2005;86(11):3078–87.

    Article  Google Scholar 

  5. van Beers R. The sources of variability in saccadic eye movements. J Neurosci. 2007;27(33):8757–70.

    Article  PubMed  Google Scholar 

  6. Berridge KC, Robinson TE. Parsing reward. Trends Neurosci. 2003;26(9):507–13.

    Article  CAS  PubMed  Google Scholar 

  7. Bettenbuhl M, Rusconi M, Engbert R, Holschneider M. Bayesian selection of markov models for symbol sequences: application to microsaccadic eye movements. PLoS ONE. 2012;7(9):e43,388.

    Google Scholar 

  8. Boccignone G. Nonparametric bayesian attentive video analysis. In: Proceedings of 19th international conference on pattern recognition, ICPR 2008. p. 1–4. IEEE Press.

  9. Boccignone G, Campadelli P, Ferrari A, Lipori G. Boosted tracking in video. Signal Process Lett IEEE. 2010;17(2):129–32.

    Article  Google Scholar 

  10. Boccignone G, Ferraro M. Modelling gaze shift as a constrained random walk. Phys A Stat Mech Appl. 2004;331(1–2):207–18.

    Article  Google Scholar 

  11. Boccignone G, Ferraro M. Feed and fly control of visual scanpaths for foveation image processing. Ann Telecommun. 2013;68(3-4):201–17.

    Google Scholar 

  12. Boccignone G, Ferraro M. Ecological sampling of gaze shifts. IEEE Trans Cybern. 2014;44(2):266–79.

    Article  PubMed  Google Scholar 

  13. Boccignone G, Marcelli A, Napoletano P, Di Fiore G, Iacovoni G, Morsa S. Bayesian integration of face and low-level cues for foveated video coding. IEEE Trans Circuits Syst Video Technol. 2008;18(12):1727–40.

    Article  Google Scholar 

  14. Borji A, Itti L. State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell. 2013;35(1):185–207.

    Article  PubMed  Google Scholar 

  15. Borji A, Sihite DN, Itti L. An object-based bayesian framework for top-down visual attention. In: Twenty-sixth AAAI conference on artificial intelligence (2012).

  16. Brockmann D, Geisel T. The ecology of gaze shifts. Neurocomputing. 2000;32(1):643–50.

    Article  Google Scholar 

  17. Bundesen C. A computational theory of visual attention. Philos Trans R Soc Lond Ser B Biol Sci. 1998;353(1373):1271–81.

    Article  CAS  Google Scholar 

  18. Canosa R. Real-world vision: selective perception and task. ACM Trans Appl Percept. 2009;6(2):11.

    Article  Google Scholar 

  19. Castellanos EH, Charboneau E, Dietrich MS, Park S, Bradley BP, Mogg K, Cowan RL. Obese adults have visual attention bias for food cue images: evidence for altered reward system function. Int J Obes. 2009;33(9):1063–73.

    Article  CAS  Google Scholar 

  20. Cerf M, Frady E, Koch C. Faces and text attract gaze independent of the task: experimental data and computer model. J Vis. 2009;9(12).

  21. Cerf M, Harel J, Einhäuser W, Koch C. Predicting human gaze using low-level saliency combined with face detection. Adv Neural Inf Process Syst. 2008;20.

  22. Chambers J, Mallows C, Stuck B. A method for simulating stable random variables. J Am Stat Assess. 1976;71(354):340–4.

    Article  Google Scholar 

  23. Chernyak DA, Stark LW. Top-down guided eye movements. IEEE Trans Syst Man Cybernet B. 2001;31:514–22.

    Article  CAS  Google Scholar 

  24. Chikkerur S, Serre T, Tan C, Poggio T. What and where: a bayesian inference theory of attention. Vis Res. 2010;50(22):2233–47.

    Article  PubMed  Google Scholar 

  25. Churchland MM, Abbott L. Two layers of neural variability. Nat Neurosci. 2012;15(11):1472–4.

    Article  CAS  PubMed  Google Scholar 

  26. Clavelli A, Karatzas D, Llados J, Ferraro M, Boccignone G. Towards modelling an attention-based text localization process. In: Sanches J, Micó L, Cardoso J, editors. Pattern recognition and image analysis, vol. 7887., Lecture notes in computer scienceBerlin: Springer; 2013. p. 296–303.

    Chapter  Google Scholar 

  27. deCroon G, Postma E, van den Herik HJ. Adaptive gaze control for object detection. Cognit Comput. 2011;3:264–78.

    Article  Google Scholar 

  28. Desimone R, Duncan J. Neural mechanisms of selective visual attention. Ann Rev Neurosci. 1995;18(1):193–222.

    Article  CAS  PubMed  Google Scholar 

  29. Dewhurst R, Nyström M, Jarodzka H, Foulsham T, Johansson R, Holmqvist K. It depends on how you look at it: scanpath comparison in multiple dimensions with multimatch, a vector-based approach. Behav Res Methods. 2012;44(4):1079–100.

    Article  PubMed  Google Scholar 

  30. Dorr M, Martinetz T, Gegenfurtner K, Barth E. Variability of eye movements when viewing dynamic natural scenes. J Vis. 2010;10(10).

  31. Einhäuser W, Rutishauser U, Koch C. Task-demands can immediately reverse the effects of sensory-driven saliency in complex visual stimuli. J Vis. 2008;8(2).

  32. Einhäuser W, Spain M, Perona P. Objects predict fixations better than early saliency. J Vis. 2008;8(14).10.1167/8.14.18. http://www.journalofvision.org/content/8/14/18.abstract.

  33. Ellis S, Stark L. Statistical dependency in visual scanning. Hum Factors J Hum Factors Ergonomics Soc. 1986;28(4):421–38.

    CAS  Google Scholar 

  34. Everitt BS. The analysis of contingency tables, vol. 45. 2nd ed. Boca Raton: CRC Press; 1992.

    Google Scholar 

  35. Feng G. Eye movements as time-series random variables: a stochastic model of eye movement control in reading. Cognit Syst Res. 2006;7(1):70–95.

    Article  Google Scholar 

  36. Foulsham T, Teszka R, Kingstone A. Saccade control in natural images is shaped by the information visible at fixation: evidence from asymmetric gaze-contingent windows. Attent Percept Psychophys. 2011;73(1):266–83.

    Article  Google Scholar 

  37. Foulsham T, Underwood G. What can saliency models predict about eye movements? Spatial and sequential aspects of fixations during encoding and recognition. J Vis. 2008;8(2).

  38. Frintrop S, Rome E, Christensen H. Computational visual attention systems and their cognitive foundations: a survey. ACM Trans Appl Percept. 2010;7(1):6.

    Article  Google Scholar 

  39. Fuster J. Upper processing stages of the perception-action cycle. Trends Cognit Sci. 2004;8(4):143–5.

    Article  Google Scholar 

  40. Gottlieb J, Balan P. Attention as a decision in information space. Trends Cognit Sci. 2010;14(6):240–8.

    Article  Google Scholar 

  41. Greenwood P, Parasuraman R. Scale of attentional focus in visual search. Percept Psychophys. 1999;61(5):837–59.

    Article  CAS  PubMed  Google Scholar 

  42. Gros C. Cognition and emotion: perspectives of a closing gap. Cognit Comput. 2010;2(2):78–85.

    Article  Google Scholar 

  43. Hacisalihzade S, Stark L, Allen J. Visual perception and sequences of eye movement fixations: a stochastic modeling approach. IEEE Trans Syst Man Cybern. 1992;22(3):474–81.

    Article  Google Scholar 

  44. Heinke D, Backhaus A. Modelling visual search with the selective attention for identification model (vs-saim): a novel explanation for visual search asymmetries. Cognit Comput. 2011;3(1):185–205.

    Article  PubMed Central  PubMed  Google Scholar 

  45. Heinke D, Humphreys GW. Attention, spatial representation, and visual neglect: simulating emergent attention and spatial memory in the selective attention for identification model (saim). Psychol Rev. 2003;110(1):29.

    Article  PubMed  Google Scholar 

  46. Heinke D, Humphreys GW. Computational models of visual selective attention: a review. Connect Models Cognit Psychol. 2005;1(4):273–312.

    Google Scholar 

  47. Hikosaka O, Nakamura K, Nakahara H. Basal ganglia orient eyes to reward. J Neurophysiol. 2006;95(2):567–84.

    Article  PubMed  Google Scholar 

  48. Ho Phuoc T, Guérin-Dugué A, Guyader N. A computational saliency model integrating saccade programming. In: Proceedings of international conference on bio-inspired systems and signal processing, pp. 57–64. Porto, Portugal (2009).

  49. Holmqvist K, Nyström M, Andersson R, Dewhurst R, Jarodzka H, Van de Weijer J. Eye tracking: a comprehensive guide to methods and measures. Oxford: Oxford University Press; 2011.

    Google Scholar 

  50. Horowitz T, Wolfe J. Visual search has no memory. Nature. 1998;394(6693):575–7.

    Article  CAS  PubMed  Google Scholar 

  51. Hou X, Zhang L. Saliency detection: a spectral residual approach. In: Proceedings CVPR ’07, vol 1, 2007. pp 1–8.

  52. Humphreys GW, Muller HJ. Search via recursive rejection (serr): a connectionist model of visual search. Cognit Psychol. 1993;25(1):43–110.

    Article  Google Scholar 

  53. Ikeda T, Hikosaka O. Reward-dependent gain and bias of visual responses in primate superior colliculus. Neuron. 2003;39(4):693–700.

    Article  CAS  PubMed  Google Scholar 

  54. Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell. 1998;20:1254–9.

    Article  Google Scholar 

  55. Keech T, Resca L. Eye movements in active visual search: a computable phenomenological model. Attent Percept Psychophys. 2010;72(2):285–307.

    Article  CAS  Google Scholar 

  56. Kimura A, Pang D, Takeuchi T, Yamato J, Kashino K. Dynamic markov random fields for stochastic modeling of visual attention. In: Proceeding ICPR ‘08; 2008. pp. 1–5. IEEE.

  57. Knill D, Kersten D, Yuille A. Introduction: a bayesian formulation of visual perception. In: Knill D, Richards W, editors. Perception as Bayesian inference. Cambridge: Cambridge University Press; 1996. p. 1–21.

  58. Knill DC, Pouget A. The bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci. 2004;27(12):712–9.

    Article  CAS  PubMed  Google Scholar 

  59. Koch C, Ullman S. Shifts in selective visual attention: towards the underlying neural circuitry. Hum Neurobiol. 1985;4(4):219–27.

    CAS  PubMed  Google Scholar 

  60. Koller D, Friedman N. Probabilistic graphical models: principles and techniques. Cambridge: MIT press; 2009.

    Google Scholar 

  61. Krause A, Guestrin C. Optimal value of information in graphical models. J Artif Intell Res. 2009;35:557–91.

    Google Scholar 

  62. Le Meur O, Baccino T, Roumy A. Prediction of the inter-observer visual congruency (iovc) and application to image ranking. In: Proceedings of the 19th ACM international conference on multimedia, 2011. p. 373–382.

  63. Leon MI, Shadlen MN. Effect of expected reward magnitude on the response of neurons in the dorsolateral prefrontal cortex of the macaque. Neuron. 1999;24(2):415–25.

    Article  CAS  PubMed  Google Scholar 

  64. Logan GD. The code theory of visual attention: an integration of space-based and object-based attention. Psychol Rev. 1996;103(4):603.

    Article  CAS  PubMed  Google Scholar 

  65. Marat S, Rahman A, Pellerin D, Guyader N, Houzet D. Improving visual saliency by adding face feature mapand center bias. Cognit Comput. 2013;5(1):63–75.

    Article  Google Scholar 

  66. Marr D. Vision: a computational investigation into the human representation and processing of visual information. New York: W.H. Freeman; 1982.

    Google Scholar 

  67. Martinez H, Lungarella M, Pfeifer R. Stochastic extension to the attention-selection system for the iCub.: University of Zurich, Tech. Rep. 2008.

  68. Maunsell JH. Neuronal representations of cognitive state: reward or attention? Trends Cogn Sci. 2004;8(6):261–5.

    Article  PubMed  Google Scholar 

  69. Mozer MC. Early parallel processing in reading: a connectionist approach. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.; 1987.

  70. Nagai Y. Stability and sensitivity of bottom-up visual attention for dynamic scene analysis. In: Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems. IEEE Press; 2009, p. 5198–5203.

  71. Najemnik J, Geisler W. Optimal eye movement strategies in visual search. Nature. 2005;434(7031):387–91.

    Article  CAS  PubMed  Google Scholar 

  72. Navalpakkam V, Itti L. Modeling the influence of task on attention. Vis Res. 2005;45(2):205–31.

    Article  PubMed  Google Scholar 

  73. Navalpakkam V, Koch C, Rangel A, Perona P. Optimal reward harvesting in complex perceptual environments. Proc Natl Acad Sci. 2010;107(11):5232–7.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  74. Otero-Millan J, Troncoso X, Macknik S, Serrano-Pedraza I, Martinez-Conde S.: Saccades and microsaccades during visual fixation, exploration, and search: foundations for a common saccadic generator. J Vis. 2008;8(14).

  75. Over E, Hooge I, Vlaskamp B, Erkelens C. Coarse-to-fine eye movement strategy in visual search. Vis Res. 2007;47:2272–80.

    Article  CAS  PubMed  Google Scholar 

  76. Palmer J, Verghese P, Pavel M. The psychophysics of visual search. Vis Res. 2000;40(10):1227–68.

    Article  CAS  PubMed  Google Scholar 

  77. Pessoa L. On the relationship between emotion and cognition. Nat Rev Neurosci. 2008;9(2):148–58.

    Article  CAS  PubMed  Google Scholar 

  78. Pessoa L, Adolphs R. Emotion processing and the amygdala: from a ’low road’ to ’many roads’ of evaluating biological significance. Nat Rev Neurosci. 2010;11(11):773–83.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  79. Peterson MS, Kramer AF, Wang RF, Irwin DE, McCarley JS. Visual search has memory. Psychol Sci. 2001;12(4):287–92.

    Article  CAS  PubMed  Google Scholar 

  80. Phaf RH, Van der Heijden A, Hudson PT. Slam: a connectionist model for attention in visual selection tasks. Cogn Psychol. 1990;22(3):273–341.

    Article  CAS  PubMed  Google Scholar 

  81. Platt ML, Glimcher PW. Neural correlates of decision variables in parietal cortex. Nature. 1999;400(6741):233–8.

    Article  CAS  PubMed  Google Scholar 

  82. Rao RP, Zelinsky GJ, Hayhoe MM, Ballard DH. Eye movements in iconic visual search. Vis Res. 2002;42(11):1447–63.

    Article  PubMed  Google Scholar 

  83. Rensink R. The dynamic representation of scenes. Vis Cogn. 2000;1(3):17–42.

    Article  Google Scholar 

  84. Rhee I, Shin M, Hong S, Lee K, Kim S, Chong S. On the levy-walk nature of human mobility. IEEE/ACM Trans Netw. 2011;19(3):630–43.

    Article  Google Scholar 

  85. Robert C. The Bayesian choice from decision-theoretic foundations to computational implementation. Berlin: Springer; 2007.

    Google Scholar 

  86. Rothkopf C, Ballard D, Hayhoe M. Task and context determine where you look. J Vis. 2007;7(14).

  87. Rutishauser U, Koch C. Probabilistic modeling of eye movement data during conjunction search via feature-based attention. J Vis. 2007;7(6).

  88. Scholl B. Objects and attention: the state of the art. Cognition. 2001;80(1–2):1–46.

    Article  CAS  PubMed  Google Scholar 

  89. Schütz A, Braun D, Gegenfurtner K. Eye movements and perception: a selective review. J Vis. 2011;11(5).

  90. Shahab A, Shafait F, Dengel A, Uchida S. How salient is scene text? In: Proceeding 10th IAPR international workshop on document analysis systems (DAS, 2012); 2012. pp. 317–321. IEEE.

  91. Shioiri S, Ikeda M. Useful resolution for picture perception as a function of eccentricity. Perception. 1989;18:347–61.

    Article  CAS  PubMed  Google Scholar 

  92. Snedecor G, Cochran W. Statistical methods. 8th ed. Ames: Iowa State University Press; 1989.

    Google Scholar 

  93. Solway A, Botvinick MM. Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. Psychol Rev. 2012;119(1):120.

    Article  PubMed Central  PubMed  Google Scholar 

  94. Sprague N, Ballard D. Eye movements for reward maximization. In: Advances in neural information processing systems, vol 16. Cambridge: MIT Press; 2003.

  95. Sprenger A, Friedrich M, Nagel M, Schmidt CS, Moritz S, Lencer R. Advanced analysis of free visual exploration patterns in schizophrenia. Front Psychol. 2013;4.

  96. Stephen D, Mirman D, Magnuson J, Dixon J. Lévy-like diffusion in eye movements during spoken-language comprehension. Phys Rev E. 2009;79(5):056,114.

    Google Scholar 

  97. Strasburger H, Rentschler I, Jüttner M. Peripheral vision and pattern recognition: a review. J Vis. 2011;11(5).

  98. Sun Y, Fisher R, Wang F, Gomes HM. A computer vision model for visual-object-based attention and eye movements. Comput Vis Image Underst. 2008;112(2):126–42.

    Article  Google Scholar 

  99. Tatler B, Baddeley R, Vincent B. The long and the short of it: spatial statistics at fixation vary with saccade amplitude and task. Vis Res. 2006;46(12):1857–62.

    Article  PubMed  Google Scholar 

  100. Tatler B, Hayhoe M, Land M, Ballard D. Eye guidance in natural vision: Reinterpreting salience. J Vis. 2011;11(5).

  101. Tatler B, Vincent B. Systematic tendencies in scene viewing. J Eye Mov Res. 2008;2(2):1–18.

    Google Scholar 

  102. Tatler B, Vincent B. The prominence of behavioural biases in eye guidance. Vis Cogn. 2009;17(6–7):1029–54.

    Article  Google Scholar 

  103. Toh WL, Rossell SL, Castle DJ. Current visual scanpath research: a review of investigations into the psychotic, anxiety, and mood disorders. Compr Psychiatr. 2011;52(6):567–79.

    Article  Google Scholar 

  104. Torralba A. Contextual priming for object detection. Int J Comp Vis. 2003;53:153–67.

    Article  Google Scholar 

  105. Treisman A. Feature binding, attention and object perception. Philos Trans R Soc Lond Ser B Biol Sci. 1998;353(1373):1295–306.

    Article  CAS  Google Scholar 

  106. Treisman AM, Gelade G. A feature-integration theory of attention. Cogn Psychol. 1980;12(1):97–136.

    Article  CAS  PubMed  Google Scholar 

  107. Underwood G, Foulsham T. Visual saliency and semantic incongruency influence eye movements when inspecting pictures. Q J Exp Psychol. 2006;59(11):1931–49.

    Article  Google Scholar 

  108. Underwood G, Foulsham T, van Loon E, Humphreys L, Bloyce J. Eye movements during scene inspection: a test of the saliency map hypothesis. Eur J Cogn Psychol. 2006;18(03):321–42.

    Article  Google Scholar 

  109. Vinciarelli A, Pantic M, Bourlard H. Social signal processing: survey of an emerging domain. Image Vis Comput. 2009;27(12):1743–59.

    Article  Google Scholar 

  110. Viola P, Jones M. Robust real-time face detection. Int J Comput Vis. 2004;57(2):137–54.

    Article  Google Scholar 

  111. Walther D, Koch C. Modeling attention to salient proto-objects. Neural Netw. 2006;19(9):1395–407.

    Article  PubMed  Google Scholar 

  112. Wang H, Pomplun M. The attraction of visual attention to texts in real-world scenes. J Vis. 2012;12(6).

  113. Wilming N, Harst S, Schmidt N, König P. Saccadic momentum and facilitation of return saccades contribute to an optimal foraging strategy. PLoS Comput Biol. 2013;9(1):e1002,871.

    Article  CAS  Google Scholar 

  114. Wischnewski M, Belardinelli A, Schneider W, Steil J. Where to look next? Combining static and dynamic proto-objects in a TVA-based model of visual attention. Cogn Comput. 2010;2(4):326–43.

    Article  Google Scholar 

  115. Wolfe JM. Guided search 2.0 a revised model of visual search. Psychon Bull Rev. 1994;1(2):202–38.

    Article  CAS  PubMed  Google Scholar 

  116. Wolfe JM. When is it time to move to the next raspberry bush? foraging rules in human visual search. J Vis. 2013;13(3). doi:10.1167/13.3.10. http://www.journalofvision.org/content/13/3/10.abstract.

  117. Zelinsky GJ. A theory of eye movements during target acquisition. Psychol Rev. 2008;115(4):787.

    Article  PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgments

The authors are grateful to the Referees and the Associate Editor, for their enlightening and valuable comments that have greatly improved the quality and clarity of an earlier version of this paper. This work was partially supported by the Spanish projects TIN2011-24631, TIN2009-14633-C03-03, CONSOLIDER INGENIO CSD2007-00018 and the fellowships RYC-2009-05031 and 2009FIB00020. With support from the Commission for Universities and Research Department for Innovation, Universities and Enterprise of the Generalitat of Catalonia and the European Social Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giuseppe Boccignone.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Clavelli, A., Karatzas, D., Lladós, J. et al. Modelling Task-Dependent Eye Guidance to Objects in Pictures. Cogn Comput 6, 558–584 (2014). https://doi.org/10.1007/s12559-014-9262-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-014-9262-3

Keywords

Navigation