Beyond document object detection: instance-level segmentation of complex layouts

Biswas, Sanket; Riba, Pau; Lladós, Josep; Pal, Umapada

doi:10.1007/s10032-021-00380-6

Beyond document object detection: instance-level segmentation of complex layouts

Special Issue Paper
Published: 21 July 2021

Volume 24, pages 269–281, (2021)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Sanket Biswas ORCID: orcid.org/0000-0001-6648-8270¹,
Pau Riba¹,
Josep Lladós¹ &
…
Umapada Pal²

1476 Accesses
15 Citations
2 Altmetric
Explore all metrics

Abstract

Information extraction is a fundamental task of many business intelligence services that entail massive document processing. Understanding a document page structure in terms of its layout provides contextual support which is helpful in the semantic interpretation of the document terms. In this paper, inspired by the progress of deep learning methodologies applied to the task of object recognition, we transfer these models to the specific case of document object detection, reformulating the traditional problem of document layout analysis. Moreover, we importantly contribute to prior arts by defining the task of instance segmentation on the document image domain. An instance segmentation paradigm is especially important in complex layouts whose contents should interact for the proper rendering of the page, i.e., the proper text wrapping around an image. Finally, we provide an extensive evaluation, both qualitative and quantitative, that demonstrates the superior performance of the proposed methodology over the current state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Microsoft COCO: Common Objects in Context

References

Agrawal, M., Doermann, D.: Voronoi++: A dynamic page segmentation approach based on voronoi and docstrum features. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 1011–1015 (2009)
Asi, A., Cohen, R., Kedem, K., El-Sana, J.: Simplifying the reading of historical manuscripts. In: Proceedings of the International Conference on Document Analysis and Recognition, IEEE, pp. 826–830 (2015)
Oliveira, D.A.B., Viana, M.P.: Fast cnn-based document layout analysis. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1173–1180 (2017)
Baechler, M., Liwicki, M., Ingold, R.: Text line extraction using dmlp classifiers for historical manuscripts. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1029–1033, IEEE (2013)
Binmakhashen, G.M., Mahmoud, S.A.: Document layout analysis: a comprehensive survey. ACM Comput. Surv. 52(6), 1–36 (2019)
Article Google Scholar
Cattoni, R., Coianiz, T., Messelodi, S., Modena, C.M.: Geometric layout analysis techniques for document image understanding: a review. ITC-irst Techn. Rep. 9703(09), 1–68 (1998)
Chen, J., Lopresti, D.: Table detection in noisy off-line handwritten documents. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 399–403 (2011)
Dey, S., Dutta, A., Toledo, J.I., Ghosh, S.K., Lladós, J., Pal, U.: Signet: Convolutional siamese network for writer independent offline signature verification. arXiv preprint arXiv:170702131 (2017)
Fang, J., Gao, L., Bai, K., Qiu, R., Tao, X., Tang, Z.: A table detection method for multipage pdf documents via visual seperators and tabular structures. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 779–783 (2011)
Gao, L., Yi, X., Jiang, Z., Hao, L., Tang, Z.: Icdar 2017 competition on page object detection. In: Proceedings of the International Conference on Document Analysis and Recognition, vol. 1, pp. 1417–1422 (2017)
Gilani, A., Qasim, S.R., Malik, I., Shafait, F.: Table detection using deep learning. In: Proceedings of the International Conference on Document Analysis and Recognition, vol. 1, pp. 771–776 (2017)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Hao, L., Gao, L., Yi, X., Tang, Z.: A table detection method for pdf documents based on convolutional neural networks. In: Proceedings of the International Workshop on Document Analysis Systems, pp. 287–292 (2016)
He, D., Cohen, S., Price, B., Kifer, D., Giles, C.L.: Multi-scale multi-task fcn for semantic page segmentation and table detection. Proceedings of the International Conference on Document Analysis and Recognition, vol. 1, pp. 254–261 (2017a)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017b)
Huang, Z., Huang, L., Gong, Y., Huang, C., Wang, X.: Mask scoring r-cnn. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6409–6418 (2019)
Isa, D., Lee, L.H., Kallimani, V., Rajkumar, R.: Text document preprocessing with the bayes formula for classification using the support vector machine. IEEE Trans. Knowl. Data Eng. 20(9), 1264–1272 (2008)
Article Google Scholar
Jain, A., Singh, S.K., Singh, K.P.: Handwritten signature verification using shallow convolutional neural network. Multimed. Tools Appl. 79, 1–26 (2020)
Journet, N., Eglin, V., Ramel, J.Y., Mullot, R.: Text/graphic labelling of ancient printed documents. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 1010–1014 (2005)
Kise, K., Sato, A., Iwata, M.: Segmentation of page images using the area voronoi diagram. Comput. Vis. Image Underst. 70(3), 370–382 (1998)
Article Google Scholar
Koopman, C., Wilhelm, A.: The effect of preprocessing on short document clustering. Arch. Data Sci. Ser. A 6(1), 01 (2020)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Li, J., Wei, Y., Liang, X., Dong, J., Xu, T., Feng, J., Yan, S.: Attentive contexts for object detection. IEEE Trans. Multimed. 19(5), 944–954 (2016)
Article Google Scholar
Li, K., Wigington, C., Tensmeyer, C., Zhao, H., Barmpalios, N., Morariu, V.I., Manjunatha, V., Sun, T., Fu, Y.: Cross-domain document object detection: Benchmark suite and method. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Li, X.H., Yin, F., Liu, C.L.: Page object detection from pdf document images by deep structured prediction and supervised clustering. In: Proceedings of the International Conference on Pattern Recognition, pp. 3627–3632 (2018)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, CL.: Microsoft coco: Common objects in context. In: Proceedings of the European Conference on Computer Vision, pp. 740–755 (2014)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017a)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2980–2988 (2017b)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: Proceedings of the European Conference on Computer Vision, pp. 21–37 (2016)
Marinai, S., Gori, M., Soda, G.: Artificial neural networks for document analysis and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 23–35 (2005)
Article Google Scholar
O’Gorman, L.: The document spectrum for bottom-up page layout analysis. In: Bunke, H. (ed.) Advances in Structural and Syntactic Pattern Recognition, pp. 270–279. World Scientific, Singapore (1992)
Google Scholar
Oliveira, S.A., Seguin, B., Kaplan, F.: dhsegment: A generic deep-learning approach for document segmentation. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 7–12. IEEE (2018)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. Pytorch: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., Buc, F.D., Fox, E.R. (eds). GarnettAdvances in Neural Information Processing Systems, Curran Associates, Inc, pp. 8026–8037 (2019)
Ramel, J.Y., Leriche, S., Demonet, M.L., Busson, S.: User-driven page layout analysis of historical printed books. Int. J. Doc. Anal. Recognit. (IJDAR) 9(2–4), 243–261 (2007)
Article Google Scholar
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc, pp. 91–99 (2015)
Riba, P., Dutta, A., Goldmann, L., Fornés, A., Ramos, O., Lladós, J.: Table detection in invoice documents by graph neural networks. In: Proceedings of the International Conference on Document Analysis and Recognition (2019)
Roberts, R.J.: Pubmed central: The genbank of the published literature (2001)
Saabni, R., El-Sana, J.: Language-independent text lines extraction using seam carving. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 563–568. IEEE (2011)
Sahare, P., Dhok, S.B.: Robust character segmentation and recognition schemes for multilingual indian document images. IETE Tech. Rev. 36(2), 209–222 (2019)
Article Google Scholar
Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In: Proceedings of the International Conference on Document Analysis and Recognition, vol. 1, pp. 1162–1167 (2017)
Sellen, A.J., Harper, R.H.: The Myth of the Paperless Office. MIT Press, Cambridge, MA, USA (2003)
Google Scholar
Shafait, F., Smith, R.: Table detection in heterogeneous documents. In: Proceedings of the International Workshop on Document Analysis Systems, pp. 65–72 (2010)
Shen, Z., Zhang, K., Dell, M.: A large dataset of historical japanese documents with complex layouts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 548–549 (2020)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556 (2014)
Soto, C., Yoo, S.: Visual detection with context for document layout analysis. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing, pp. 3455–3461 (2019)
Staar, P.W., Dolfi, M., Auer, C., Bekas, C.: Corpus conversion service: a machine learning platform to ingest documents at scale. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 774–782 (2018)
Studer, L., Alberti, M., Pondenkandath, V., Goktepe, P., Kolonko, T., Fischer, A., Liwicki, M., Ingold, R.: A comprehensive study of imagenet pre-training for historical document image analysis. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 720–725 (2019)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:160207261 (2016)
Tran, T.A., Na, I.S., Kim, S.H.: Hybrid page segmentation using multilevel homogeneity structure. In: Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication, pp. 1–6 (2015)
Tupaj, S., Shi, Z., Chang, C.H., Alam, H.: Extracting Tabular Information from Text Files. EECS Department, Tufts University, Medford, USA (1996)
Google Scholar
Wei, H., Baechler, M., Slimane, F., Ingold, R.: Evaluation of svm, mlp and gmm classifiers for layout analysis of historical documents. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1220–1224. IEEE (2013)
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: (2019) Detectron2
Xiao, Y., Yan, H.: Location of title and author regions in document images based on the delaunay triangulation. Image Vis. Comput. 22(4), 319–329 (2004)
Article Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Xie, Z., Huang, Y., Jin, L., Liu, Y., Zhu, Y., Gao, L., Zhang, X.: Weakly supervised precise segmentation for historical document images. Neurocomputing 350, 271–281 (2019)
Article Google Scholar
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: Layoutlm: Pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1192–1200 (2020)
Yi, X., Gao, L., Liao, Y., Zhang, X., Liu, R., Jiang, Z.: Cnn based page object detection in document images. In: Proceedings of the International Conference on Document Analysis and Recognition, vol. 1, pp. 230–235 (2017)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Proceedings of the European Conference on Computer Vision, pp. 818–833 (2014)
Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document layout analysis. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 1015–1022 (2019)

Download references

Acknowledgements

This work has been partially supported by the Spanish projects RTI2018-095645-B-C21 and FCT-19-15244, and the Catalan projects 2017-SGR-1783, the CERCA Program / Generalitat de Catalunya and PhD Scholarship from AGAUR (2021FIB-10010).

Author information

Authors and Affiliations

Computer Vision Center & Computer Science Department, Universitat Autónoma de Barcelona, Bellaterra, Spain
Sanket Biswas, Pau Riba & Josep Lladós
CVPR Unit, Indian Statistical Institute, Kolkata, India
Umapada Pal

Authors

Sanket Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Pau Riba
View author publications
You can also search for this author in PubMed Google Scholar
Josep Lladós
View author publications
You can also search for this author in PubMed Google Scholar
Umapada Pal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanket Biswas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Biswas, S., Riba, P., Lladós, J. et al. Beyond document object detection: instance-level segmentation of complex layouts. IJDAR 24, 269–281 (2021). https://doi.org/10.1007/s10032-021-00380-6

Download citation

Received: 19 November 2020
Revised: 27 May 2021
Accepted: 08 June 2021
Published: 21 July 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s10032-021-00380-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Beyond document object detection: instance-level segmentation of complex layouts

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Microsoft COCO: Common Objects in Context

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Beyond document object detection: instance-level segmentation of complex layouts

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Microsoft COCO: Common Objects in Context

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation