research-article

Semantic Summarization of Egocentric Photo Stream Events

Authors:
Aniol Lidon

Universitat Politecnica de Catalunya, Barcelona, Spain

Universitat Politecnica de Catalunya, Barcelona, Spain
View Profile

,
Marc Bolaños

Universitat de Barcelona, Barcelona, Spain

Universitat de Barcelona, Barcelona, Spain
View Profile

,
Mariella Dimiccoli

Universitat de Barcelona, Barcelona, Spain

Universitat de Barcelona, Barcelona, Spain
View Profile

,
Petia Radeva

Universitat de Barcelona, Barcelona, Spain

Universitat de Barcelona, Barcelona, Spain
View Profile

,
Maite Garolera

Consorci Sanitari de Terrassa, Terrassa, Spain

Consorci Sanitari de Terrassa, Terrassa, Spain
View Profile

,
Xavier Giro-i-Nieto

Universitat Politecnica de Catalunya, Barcelona, Spain

Universitat Politecnica de Catalunya, Barcelona, Spain
View Profile

LTA '17: Proceedings of the 2nd Workshop on Lifelogging Tools and ApplicationsOctober 2017Pages 3–11https://doi.org/10.1145/3133202.3133204

Published:23 October 2017Publication History

LTA '17: Proceedings of the 2nd Workshop on Lifelogging Tools and Applications

Pages 3–11

ABSTRACT

With the rapid increase of users of wearable cameras in recent years and of the amount of data they produce, there is a strong need for automatic retrieval and summarization techniques. This work addresses the problem of automatically summarizing egocentric photo streams captured through a wearable camera by taking an image retrieval perspective. After removing non-informative images by a new CNN-based filter, images are ranked by relevance to ensure semantic diversity and finally re-ranked by a novelty criterion to reduce redundancy. To assess the results, a new evaluation metric is proposed which takes into account the non-uniqueness of the solution. Experimental results applied on a database of 7,110 images from 6 different subjects and evaluated by experts gave 95.74% of experts satisfaction and a Mean Opinion Score of 4.57 out of 5.0.

References

Omid Aghazadeh, Josephine Sullivan, and Stefan Carlsson. 2011. Novelty detection from an ego-centric perspective. Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 3297--3304. Google ScholarDigital Library
Michael Blighe, Aiden Doherty, Alan F. Smeaton, and Noel E. O'Connor. 2008. Keyframe Detection in Visual Lifelogs. In Proceedings of the 1st International Conference on PErvasive Technologies Related to Assistive Environments (PETRA '08). ACM, New York, NY, USA, Article 55, 2 pages. Google ScholarDigital Library
Marc Bolaños, Ricard Mestre, Estefanía Talavera, Xavier Giró-i Nieto, and Petia Radeva. 2015. Visual Summary of Egocentric Photostreams by Representative Keyframes. arXiv preprint arXiv:1505.01130 (2015).Google Scholar
Jaime Carbonell and Jade Goldstein. 1998. The Use of MMR, Diversity-based Reranking for Reordering Documents and Producing Summaries Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98). ACM, New York, NY, USA, 335--336. Google ScholarDigital Library
Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 659--666. Google ScholarDigital Library
Duc-Tien Dang-Nguyen, Luca Piras, Giorgio Giacinto, Giulia Boato, and FGB De Natale. 2014. Retrieval of Diverse Images by Pre-filtering and Hierarchical Clustering. Working Notes of MediaEval (2014).Google Scholar
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 248--255.Google Scholar
Thomas Deselaers, Tobias Gass, Philippe Dreuw, and Hermann Ney. 2009 a. Jointly Optimising Relevance and Diversity in Image Retrieval Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR '09). ACM, New York, NY, USA, Article 39, b8 pages. Google ScholarDigital Library
Thomas Deselaers, Tobias Gass, Philippe Dreuw, and Hermann Ney. 2009 b. Jointly optimising relevance and diversity in image retrieval Proceedings of the ACM international conference on image and video retrieval. ACM, 39. Google ScholarDigital Library
Aiden R. Doherty, Daragh Byrne, Alan F. Smeaton, Gareth J. F. Jones, and Mark Hughes. 2008. Investigating Keyframe Selection Methods in the Novel Domain of Passively Captured Visual Lifelogs. In Proceedings of the 2008 International Conference on Content-based Image and Video Retrieval (CIVR '08). ACM, New York, NY, USA, 259--268. Google ScholarDigital Library
Joydeep Ghosh, Yong Jae Lee, and Kristen Grauman. 2012. Discovering important people and objects for egocentric video summarization 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1346--1353. Google ScholarDigital Library
Boqing Gong, Wei-Lun Chao, Kristen Grauman, and Fei Sha. 2014. Diverse Sequential Subset Selection for Supervised Video Summarization. Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K.Q. Weinberger (Eds.). Curran Associates, Inc., 2069--2077. http://papers.nips.cc/paper/5413-diverse-sequential-subset-selection-for-supervised-video-summarization.pdf Google ScholarDigital Library
Michael Gygli, Helmut Grabner, Hayko Riemenschneider, and Luc Van Gool. 2014. Creating summaries from user videos. In European conference on computer vision. Springer, 505--520.Google ScholarCross Ref
Steve Hodges, Lyndsay Williams, Emma Berry, Shahram Izadi, James Srinivasan, Alex Butler, Gavin Smyth, Narinder Kapur, and Ken Wood. 2006. SenseCam: A Retrospective Memory Aid. In Proceedings of the 8th International Conference on Ubiquitous Computing (UbiComp'06). Springer-Verlag, Berlin, Heidelberg, 177--193. Google ScholarDigital Library
Judy Hoffman, Sergio Guadarrama, Eric Tzeng, Jeff Donahue, Ross B. Girshick, Trevor Darrell, and Kate Saenko. 2014. LSDA: Large Scale Detection Through Adaptation. CoRR Vol. abs/1407.5035 (2014). http://arxiv.org/abs/1407.5035 Google ScholarDigital Library
Bogdan Ionescu, Adrian Popescu, Mihai Lupu, Alexandru L Gınsca, and Henning Müller. 2014. Retrieving diverse social images at mediaeval 2014: Challenge, dataset and evaluation MediaEval 2014 Workshop, Barcelona, Spain.Google Scholar
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding Proceedings of the ACM International Conference on Multimedia. ACM, 675--678. Google ScholarDigital Library
Amornched Jinda-Apiraksa, Jana Machajdik, and Robert Sablatnig. 2012. A keyframe selection of lifelog image sequences. Erasmus Mundus M. Sc. in Visions and Robotics thesis, Vienna University of Technology (TU Wien) (2012).Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105. Google ScholarDigital Library
Alex Kulesza and Ben Taskar. 2012. Determinantal point processes for machine learning. arXiv preprint arXiv:1207.6083 (2012). Google ScholarDigital Library
Matthew L. Lee and Anind K. Dey. 2008. Lifelogging Memory Appliance for People with Episodic Memory Impairment Proceedings of the 10th International Conference on Ubiquitous Computing (UbiComp '08). ACM, New York, NY, USA, 44--53. Google ScholarDigital Library
Zheng Lu and Kristen Grauman. 2013. Story-driven summarization for egocentric video. Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2714--2721. Google ScholarDigital Library
S. Mann. 1998. 'WearCam' (The wearable camera): personal imaging systems for long-term use in wearable tetherless computer-mediated reality and personal photo/videographic memory prosthesis. In Wearable Computers, 1998. Digest of Papers. Second International Symposium on. 124--131. Google ScholarDigital Library
W. W. Mayol, B. J. Tordoff, and D. W. Murray. 2002. Wearable Visual Robots. Personal Ubiquitous Comput. Vol. 6, 1 (Jan.. 2002), 37--48. 1145/1526709.1526756 Google ScholarDigital Library
Mark Montague and Javed A Aslam. 2001. Relevance score normalization for metasearch. In Proceedings of the tenth international conference on Information and knowledge management. ACM, 427--433. Google ScholarDigital Library
Junting Pan, Kevin McGuinness, and Xavier Giró-i Nieto. 2016. End-to-end Convolutional Network for Saliency Prediction. In Submitted to CVPR.Google Scholar
P. Piasek, K. Irving, and A.F. Smeaton. 2011. SenseCam intervention based on Cognitive Stimulation Therapy framework for early-stage dementia. In Pervasive Computing Technologies for Healthcare (PervasiveHealth), 2011 5th International Conference on. 522--525.Google Scholar
Paulina Piasek, Alan F Smeaton, et al. 2014. Using lifelogging to help construct the identity of people with dementia. (2014).Google Scholar
Sachan Priyamvada Rajendra and N. Keshaveni. 2014. A survey of automatic video summarization techniques. International Journal of Electronics, Electrical and Computational System 2 (2014).Google Scholar
M. Elena Renda and Umberto Straccia. 2003. Web metasearch: rank vs. score based rank aggregation methods. In Proceedings of the 2003 ACM symposium on Applied computing. ACM, 841--846. Google ScholarDigital Library
Abigail J. Sellen, Andrew Fogg, Mike Aitken, Steve Hodges, Carsten Rother, and Ken Wood. 2007. Do Life-logging Technologies Support Memory for the Past? An Experimental Study Using Sensecam. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '07). ACM, New York, NY, USA, 81--90. Google ScholarDigital Library
Kai Song, Yonghong Tian, Wen Gao, and Tiejun Huang. 2006. Diversifying the Image Retrieval Results. In Proceedings of the 14th Annual ACM International Conference on Multimedia (MULTIMEDIA '06). ACM, New York, NY, USA, 707--710. Google ScholarDigital Library
Aimee Spector, Lene Thorgrimsen, Bob Woods, Lindsay Royan, Steve Davies, Margaret Butterworth (deceased), and Martin Orrell. 2003. Efficacy of an evidence-based cognitive stimulation therapy programme for people with dementia. The British Journal of Psychiatry 183, 3 (2003), 248--254.Google ScholarCross Ref
Eleftherios Spyromitros-Xioufis, Symeon Papadopoulos, Yiannis Kompatsiaris, and Ioannis Vlahavas. 2014. SocialSensor: Finding Diverse Images at MediaEval 2014. (2014).Google Scholar
Robert C Streijl, Stefan Winkler, and David S Hands. 2016. Mean opinion score (MOS) revisited: methods and applications, limitations and alternatives. Multimedia Systems 22, 2 (2016), 213--227. Google ScholarDigital Library
Estefania Talavera, Mariella Dimiccoli, Marc Bolaños, Maedeh Aghaei, and Petia Radeva. 2015. R-clustering for egocentric video segmentation. In Pattern Recognition and Image Analysis. Springer, 327--336.Google Scholar
Reinier H. van Leuken, Lluis Garcia, Ximena Olivares, and Roelof van Zwol. 2009. Visual Diversification of Image Search Results. In Proceedings of the 18th International Conference on World Wide Web (WWW '09). ACM, New York, NY, USA, 341--350. Google ScholarDigital Library
Cheng Xiang Zhai, William W Cohen, and John Lafferty. 2003. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 10--17. Google ScholarDigital Library
Xiangxin Zhu and Deva Ramanan. 2012. Face detection, pose estimation, and landmark localization in the wild Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2879--2886. Google ScholarDigital Library

Index Terms

Semantic Summarization of Egocentric Photo Stream Events
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding
2. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

Text document summarization using word embedding
Highlights
- Paper proposes using semantics as feature for text summarization.
- Results prove ...
Abstract
Automatic text summarization essentially condenses a long document into a shorter format while preserving its information content and overall meaning. It is a potential solution to the information overload. Several automatic ...
Read More
Exploring events and distributed representations of text in multi-document summarization

We explore an event detection framework to improve multi-document summarizationWe use distributed representations of text to address different lexical realizationsSummarization is based on the hierarchical combination of single-document summariesWe ...
Read More
Topic-driven reader comments summarization
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Readers of a news article often read its comments contributed by other readers. By reading comments, readers obtain not only complementary information about this news article but also the opinions from other readers. However, the existing ranking ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LTA '17: Proceedings of the 2nd Workshop on Lifelogging Tools and Applications
October 2017
40 pages
ISBN:9781450355032
DOI:10.1145/3133202
Program Chairs:
Cathal Gurrin
Insight Centre for Data Analytics & Dublin City University, Ireland
,
Xavier Giro-i-Nieto
Universitat Politecnica de Catalunya, Catalonia/Spain
,
Petia Radeva
Universitat de Barcelona, Spain
,
Duc-Tien Dang-Nguyen
Insight Centre for Data Analytics & Dublin City University, Ireland
,
Mariella Dimiccoli
Computer Vision Centre & Universitat de Barcelona, Spain
,
Hideo Joho
University of Tsukuba, Japan
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
convolutional neural networks
lifelogging
photo stream summarization
semantic summarization
Qualifiers
- research-article
Conference

Acceptance Rates
LTA '17 Paper Acceptance Rate2of3submissions,67%Overall Acceptance Rate6of10submissions,60%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 97
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Semantic Summarization of Egocentric Photo Stream Events

LTA '17: Proceedings of the 2nd Workshop on Lifelogging Tools and Applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

Text document summarization using word embedding

Exploring events and distributed representations of text in multi-document summarization

Topic-driven reader comments summarization