skip to main content
10.1145/3133202.3133204acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Semantic Summarization of Egocentric Photo Stream Events

Published:23 October 2017Publication History

ABSTRACT

With the rapid increase of users of wearable cameras in recent years and of the amount of data they produce, there is a strong need for automatic retrieval and summarization techniques. This work addresses the problem of automatically summarizing egocentric photo streams captured through a wearable camera by taking an image retrieval perspective. After removing non-informative images by a new CNN-based filter, images are ranked by relevance to ensure semantic diversity and finally re-ranked by a novelty criterion to reduce redundancy. To assess the results, a new evaluation metric is proposed which takes into account the non-uniqueness of the solution. Experimental results applied on a database of 7,110 images from 6 different subjects and evaluated by experts gave 95.74% of experts satisfaction and a Mean Opinion Score of 4.57 out of 5.0.

References

  1. Omid Aghazadeh, Josephine Sullivan, and Stefan Carlsson. 2011. Novelty detection from an ego-centric perspective. Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 3297--3304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Michael Blighe, Aiden Doherty, Alan F. Smeaton, and Noel E. O'Connor. 2008. Keyframe Detection in Visual Lifelogs. In Proceedings of the 1st International Conference on PErvasive Technologies Related to Assistive Environments (PETRA '08). ACM, New York, NY, USA, Article 55, 2 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Marc Bolaños, Ricard Mestre, Estefanía Talavera, Xavier Giró-i Nieto, and Petia Radeva. 2015. Visual Summary of Egocentric Photostreams by Representative Keyframes. arXiv preprint arXiv:1505.01130 (2015).Google ScholarGoogle Scholar
  4. Jaime Carbonell and Jade Goldstein. 1998. The Use of MMR, Diversity-based Reranking for Reordering Documents and Producing Summaries Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98). ACM, New York, NY, USA, 335--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 659--666. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Duc-Tien Dang-Nguyen, Luca Piras, Giorgio Giacinto, Giulia Boato, and FGB De Natale. 2014. Retrieval of Diverse Images by Pre-filtering and Hierarchical Clustering. Working Notes of MediaEval (2014).Google ScholarGoogle Scholar
  7. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 248--255.Google ScholarGoogle Scholar
  8. Thomas Deselaers, Tobias Gass, Philippe Dreuw, and Hermann Ney. 2009 a. Jointly Optimising Relevance and Diversity in Image Retrieval Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR '09). ACM, New York, NY, USA, Article 39, b8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Thomas Deselaers, Tobias Gass, Philippe Dreuw, and Hermann Ney. 2009 b. Jointly optimising relevance and diversity in image retrieval Proceedings of the ACM international conference on image and video retrieval. ACM, 39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Aiden R. Doherty, Daragh Byrne, Alan F. Smeaton, Gareth J. F. Jones, and Mark Hughes. 2008. Investigating Keyframe Selection Methods in the Novel Domain of Passively Captured Visual Lifelogs. In Proceedings of the 2008 International Conference on Content-based Image and Video Retrieval (CIVR '08). ACM, New York, NY, USA, 259--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Joydeep Ghosh, Yong Jae Lee, and Kristen Grauman. 2012. Discovering important people and objects for egocentric video summarization 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1346--1353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Boqing Gong, Wei-Lun Chao, Kristen Grauman, and Fei Sha. 2014. Diverse Sequential Subset Selection for Supervised Video Summarization. Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K.Q. Weinberger (Eds.). Curran Associates, Inc., 2069--2077. http://papers.nips.cc/paper/5413-diverse-sequential-subset-selection-for-supervised-video-summarization.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Michael Gygli, Helmut Grabner, Hayko Riemenschneider, and Luc Van Gool. 2014. Creating summaries from user videos. In European conference on computer vision. Springer, 505--520.Google ScholarGoogle ScholarCross RefCross Ref
  14. Steve Hodges, Lyndsay Williams, Emma Berry, Shahram Izadi, James Srinivasan, Alex Butler, Gavin Smyth, Narinder Kapur, and Ken Wood. 2006. SenseCam: A Retrospective Memory Aid. In Proceedings of the 8th International Conference on Ubiquitous Computing (UbiComp'06). Springer-Verlag, Berlin, Heidelberg, 177--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Judy Hoffman, Sergio Guadarrama, Eric Tzeng, Jeff Donahue, Ross B. Girshick, Trevor Darrell, and Kate Saenko. 2014. LSDA: Large Scale Detection Through Adaptation. CoRR Vol. abs/1407.5035 (2014). http://arxiv.org/abs/1407.5035 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Bogdan Ionescu, Adrian Popescu, Mihai Lupu, Alexandru L Gınsca, and Henning Müller. 2014. Retrieving diverse social images at mediaeval 2014: Challenge, dataset and evaluation MediaEval 2014 Workshop, Barcelona, Spain.Google ScholarGoogle Scholar
  17. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding Proceedings of the ACM International Conference on Multimedia. ACM, 675--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Amornched Jinda-Apiraksa, Jana Machajdik, and Robert Sablatnig. 2012. A keyframe selection of lifelog image sequences. Erasmus Mundus M. Sc. in Visions and Robotics thesis, Vienna University of Technology (TU Wien) (2012).Google ScholarGoogle Scholar
  19. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Alex Kulesza and Ben Taskar. 2012. Determinantal point processes for machine learning. arXiv preprint arXiv:1207.6083 (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Matthew L. Lee and Anind K. Dey. 2008. Lifelogging Memory Appliance for People with Episodic Memory Impairment Proceedings of the 10th International Conference on Ubiquitous Computing (UbiComp '08). ACM, New York, NY, USA, 44--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zheng Lu and Kristen Grauman. 2013. Story-driven summarization for egocentric video. Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2714--2721. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Mann. 1998. 'WearCam' (The wearable camera): personal imaging systems for long-term use in wearable tetherless computer-mediated reality and personal photo/videographic memory prosthesis. In Wearable Computers, 1998. Digest of Papers. Second International Symposium on. 124--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. W. Mayol, B. J. Tordoff, and D. W. Murray. 2002. Wearable Visual Robots. Personal Ubiquitous Comput. Vol. 6, 1 (Jan.. 2002), 37--48. 1145/1526709.1526756 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Mark Montague and Javed A Aslam. 2001. Relevance score normalization for metasearch. In Proceedings of the tenth international conference on Information and knowledge management. ACM, 427--433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Junting Pan, Kevin McGuinness, and Xavier Giró-i Nieto. 2016. End-to-end Convolutional Network for Saliency Prediction. In Submitted to CVPR.Google ScholarGoogle Scholar
  27. P. Piasek, K. Irving, and A.F. Smeaton. 2011. SenseCam intervention based on Cognitive Stimulation Therapy framework for early-stage dementia. In Pervasive Computing Technologies for Healthcare (PervasiveHealth), 2011 5th International Conference on. 522--525.Google ScholarGoogle Scholar
  28. Paulina Piasek, Alan F Smeaton, et al. 2014. Using lifelogging to help construct the identity of people with dementia. (2014).Google ScholarGoogle Scholar
  29. Sachan Priyamvada Rajendra and N. Keshaveni. 2014. A survey of automatic video summarization techniques. International Journal of Electronics, Electrical and Computational System 2 (2014).Google ScholarGoogle Scholar
  30. M. Elena Renda and Umberto Straccia. 2003. Web metasearch: rank vs. score based rank aggregation methods. In Proceedings of the 2003 ACM symposium on Applied computing. ACM, 841--846. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Abigail J. Sellen, Andrew Fogg, Mike Aitken, Steve Hodges, Carsten Rother, and Ken Wood. 2007. Do Life-logging Technologies Support Memory for the Past? An Experimental Study Using Sensecam. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '07). ACM, New York, NY, USA, 81--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kai Song, Yonghong Tian, Wen Gao, and Tiejun Huang. 2006. Diversifying the Image Retrieval Results. In Proceedings of the 14th Annual ACM International Conference on Multimedia (MULTIMEDIA '06). ACM, New York, NY, USA, 707--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Aimee Spector, Lene Thorgrimsen, Bob Woods, Lindsay Royan, Steve Davies, Margaret Butterworth (deceased), and Martin Orrell. 2003. Efficacy of an evidence-based cognitive stimulation therapy programme for people with dementia. The British Journal of Psychiatry 183, 3 (2003), 248--254.Google ScholarGoogle ScholarCross RefCross Ref
  34. Eleftherios Spyromitros-Xioufis, Symeon Papadopoulos, Yiannis Kompatsiaris, and Ioannis Vlahavas. 2014. SocialSensor: Finding Diverse Images at MediaEval 2014. (2014).Google ScholarGoogle Scholar
  35. Robert C Streijl, Stefan Winkler, and David S Hands. 2016. Mean opinion score (MOS) revisited: methods and applications, limitations and alternatives. Multimedia Systems 22, 2 (2016), 213--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Estefania Talavera, Mariella Dimiccoli, Marc Bolaños, Maedeh Aghaei, and Petia Radeva. 2015. R-clustering for egocentric video segmentation. In Pattern Recognition and Image Analysis. Springer, 327--336.Google ScholarGoogle Scholar
  37. Reinier H. van Leuken, Lluis Garcia, Ximena Olivares, and Roelof van Zwol. 2009. Visual Diversification of Image Search Results. In Proceedings of the 18th International Conference on World Wide Web (WWW '09). ACM, New York, NY, USA, 341--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Cheng Xiang Zhai, William W Cohen, and John Lafferty. 2003. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 10--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Xiangxin Zhu and Deva Ramanan. 2012. Face detection, pose estimation, and landmark localization in the wild Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2879--2886. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Semantic Summarization of Egocentric Photo Stream Events

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        LTA '17: Proceedings of the 2nd Workshop on Lifelogging Tools and Applications
        October 2017
        40 pages
        ISBN:9781450355032
        DOI:10.1145/3133202

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 October 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        LTA '17 Paper Acceptance Rate2of3submissions,67%Overall Acceptance Rate6of10submissions,60%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader