Abstract
Optical Music Recognition refers to the task of transcribing the image of a music score into a machine-readable format. Many music scores are written in a single staff, and therefore, they could be treated as a sequence. Therefore, this work explores the use of Long Short-Term Memory (LSTM) Recurrent Neural Networks for reading the music score sequentially, where the LSTM helps in keeping the context. For training, we have used a synthetic dataset of more than 40000 images, labeled at primitive level. The experimental results are promising, showing the benefits of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
Symbols appear with specific duration (rhythm) and pitch (melody).
- 4.
- 5.
- 6.
- 7.
- 8.
L = Line; S=Space; L1 is the bottom line on the staff and S1 is the space between line 1 and 2.
- 9.
- 10.
References
Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marçal, A.R.S., Guedes, C., Cardoso, J.S.: Optical music recognition: state-of-the-art and open issues. IJMIR 1(3), 173–190 (2012)
Bainbridge, D., Bell, T.: The challenge of optical music recognition. Comput. Hum. 35(2), 95–121 (2001)
Fornés, A., Sánchez, G.: Analysis and recognition of music scores. In: Doermann, D., Tombre, K. (eds.) Handbook of Document Image Processing and Recognition, pp. 749–774. Springer, London (2014). https://doi.org/10.1007/978-0-85729-859-1_24
Pinto, T., Rebelo, A., Giraldi, G.A., Cardoso, J.S.: Music score binarization based on domain knowledge. Pattern Recognit. Image Anal. 2011, 700–708 (2011)
Gallego, A., Calvo-Zaragoza, J.: Staff-line removal with selectional auto-encoders. Expert. Syst. Appl. 89, 138–48 (2017)
Pacha, A., Eidenberger, H.: Towards a universal music symbol classifier. In: 12th International Workshop on Graphics Recognition (GREC), pp. 35–36 (2017)
Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: NIPS, pp. 545–552 (2009)
Campos, V.B., Calvo-Zaragoza, J., Toselli, A.H., Vidal-Ruiz, E.: Sheet music statistical layout analysis. In: ICFHR, pp. 313–318 (2016)
Burgoyne, J.A., Ouyang, Y., Himmelman, T., Devaney, J., Pugin, L., Fujinaga, I.: Lyric extraction and recognition on digital images of early music sources. In: ISMIR, pp. 723–727 (2009)
Graves, A., Mohamed, A.-R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)
Pedersoli, F., Tzanetakis, G.: Document segmentation and classification into musical scores and text. Int. J. Doc. Anal. Recognit. (IJDAR) 19(4), 289–304 (2016)
Fornés, A., Lladós, J., Sánchez, G., Karatzas, D.: Rotation invariant hand drawn symbol recognition based on a dynamic time warping model. IJDAR 13(3), 229–241 (2010)
Escalera, S., Fornés, A., Pujol, O., Radeva, P., Sánchez, G., Lladós, J.: Blurred Shape Model for binary and grey-level symbol recognition. Pattern Recognit. Lett. 30(15), 1424–1433 (2009)
Rebelo, A., Capela, G., Cardoso, J.S.: Optical recognition of music symbols: a comparative study. IJDAR 13(1), 19–31 (2010)
Rebelo, A., Tkaczuk, J., Sousa, R., Cardoso, J.S.: Metric learning for music symbol recognition. In: 2011 10th International Conference on Machine Learning and Applications and Workshops, vol. 2, pp. 106–111, December 2011
Coüasnon, B., Rétif, B.: Using a grammar for a reliable full score recognition system. In: ICMC (1995)
Pugin, L.: Optical music recognitoin of early typographic prints using hidden markov models. In: ISMIR (2006)
Pugin, L., Burgoyne, J.A., Fujinaga, I.: Map adaptation to improve optical music recognition of early music documents using hidden markov models. In: ISMIR (2007)
Pinto, J.C., Vieira, P., Sousa, J.M.: A new graph-like classification method applied to ancient handwritten musical symbols. Doc. Anal. Recognit. 6(1), 10–22 (2003)
Choi, K.-Y., Coüasnon, B., Ricquebourg, Y., Zanibbi, R.: Bootstrapping samples of accidentals in dense piano scores for CNN-based detection. In: 12th International Workshop on Graphics Recognition (GREC), pp. 19–20 (2017)
Dorfer, M., Hajič, J., Widmer, G.: On the potential of fully convolutional neural networks for musical symbol detection. In: 12th International Workshop on Graphics Recognition (GREC), pp. 53–54 (2017)
Baró, A., Riba, P., Fornés, A.: Towards the recognition of compound music notes in handwritten music scores. In: ICFHR, pp. 465–470, October 2016
Matsushima, T., Ohteru, S., Hashimoto, S.: An integrated music information processing system: PSB-er. In: Proceedings of the International Computer Music Conference, pp. 191–198 (1989)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.org
Owens, A., Isola, P., McDermott, J.H., Torralba, A., Adelson, E.H., Freeman, W.T.: Visually indicated sounds. CoRR, vol. abs/1512.08512 (2015)
Aytar, Y., Vondrick, C., Torralba, A.: SoundNet: learning sound representations from unlabeled video. CoRR, vol. abs/1610.09001 (2016)
Sübakan, Y.C., Smaragdis, P.: Diagonal RNNs in symbolic music modeling. CoRR, vol. abs/1704.05420 (2017)
Kalingeri, V., Grandhe, S.: Music generation with deep learning. CoRR, vol. abs/1612.04928 (2016)
Pascanu, R., Gülçehre, Ç., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks. CoRR, vol. abs/1312.6026 (2013)
van der Wel, E., Ullrich, K.: Optical music recognition with convolutional sequence-to-sequence models. CoRR, vol. abs/1707.04877 (2017)
Calvo-Zaragoza, J., Valero-Mas, J.J., Pertusa, A.: End-to-end optical music recognition using neural networks. In: ISMIR (2017)
Pacha, A., Eidenberger, H.M.: Towards self-learning optical music recognition. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 795–800 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Frinken, V., Bunke, H.: Continuous handwritten script recognition. In: Doermann, D., Tombre, K. (eds.) Handbook of Document Image Processing and Recognition, pp. 391–425. Springer, London (2014). https://doi.org/10.1007/978-0-85729-859-1_12
Acknowledgment
This work has been partially supported by the Spanish project TIN2015-70924-C2-2-R, the Ramon y Cajal Fellowship RYC-2014-16831, the CERCA Program/Generalitat de Catalunya, FPU fellowship FPU15/06264 from the Spanish Ministerio de Educación, Cultura y Deporte, the social Sciences and Humanities Research Council of Canada and the FI fellowship AGAUR 2018 FI_B 00546 of the Generalitat de Catalunya. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Baró, A., Riba, P., Calvo-Zaragoza, J., Fornés, A. (2018). Optical Music Recognition by Long Short-Term Memory Networks. In: Fornés, A., Lamiroy, B. (eds) Graphics Recognition. Current Trends and Evolutions. GREC 2017. Lecture Notes in Computer Science(), vol 11009. Springer, Cham. https://doi.org/10.1007/978-3-030-02284-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-02284-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02283-9
Online ISBN: 978-3-030-02284-6
eBook Packages: Computer ScienceComputer Science (R0)