Skip to main content

Advertisement

Log in

Informational content of cosine and other similarities calculated from high-dimensional Conceptual Property Norm data

Cognitive Processing Aims and scope Submit manuscript

Abstract

To study concepts that are coded in language, researchers often collect lists of conceptual properties produced by human subjects. From these data, different measures can be computed. In particular, inter-concept similarity is an important variable used in experimental studies. Among possible similarity measures, the cosine of conceptual property frequency vectors seems to be a de facto standard. However, there is a lack of comparative studies that test the merit of different similarity measures when computed from property frequency data. The current work compares four different similarity measures (cosine, correlation, Euclidean and Chebyshev) and five different types of data structures. To that end, we compared the informational content (i.e., entropy) delivered by each of those 4 × 5 = 20 combinations, and used a clustering procedure as a concrete example of how informational content affects statistical analyses. Our results lead us to conclude that similarity measures computed from lower-dimensional data fare better than those calculated from higher-dimensional data, and suggest that researchers should be more aware of data sparseness and dimensionality, and their consequences for statistical analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

Download references

Acknowledgements

We want to thank two anonymous reviewers for their useful comments to a previous version of this manuscript. We also want to thank Eyal Sagi for his valuable input regarding the ideas discussed here. This research was carried out with funds provided by grant 1200139 from the Fondo Nacional de Desarrollo Científico y Tecnológico (FONDECYT) of the Chilean government to the first and second authors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enrique Canessa.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the special topic ‘Eliciting Semantic Properties: Methods and Applications’ guest-edited by Barry Devereux, and Alessandro Lenci.

Handling editor: Alessandro Lenci; Reviewers: David Vinson (University College London), Cai Wingfield (Lancaster University).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Canessa, E., Chaigneau, S.E., Moreno, S. et al. Informational content of cosine and other similarities calculated from high-dimensional Conceptual Property Norm data. Cogn Process 21, 601–614 (2020). https://doi.org/10.1007/s10339-020-00985-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10339-020-00985-5

Keywords

Navigation