6 minute read.
Publishers, help us capture Events for your content
The day I received my learner driver permit, I remember being handed three things: a plastic thermosealed reminder that age sixteen was not a good look on me; a yellow L-plate sign as flimsy as my driving ability; and a weighty ‘how to drive’ guide listing all the things that I absolutely must not, under any circumstances, even-if-it-seems-like-a-really-swell-idea-at-the-time, never, ever do.
The margin space dedicated to finger-wagging left little room for championing any driving-do’s. And as each page delivered a fresh new warning, my enthusiasm for hitting the road sunk to levels usually reserved for activities like trigonometry and visits to my orthodontist.
Many years (and an excellent driving record) later, I’m reminded of this again now when thinking about our own Event Data User Guide. Because it contains a chapter with some really important don’ts for our members. Really good, we’d-love-you-to-consider-not-doing-these-things type of advice. But despite our intent to encourage, I feel the ghost of finger-waggers past. So in the spirit of championing enthusiasm over ennui, I thought I’d attempt to contextualise our Event Data Best Practices Guide for Publishers and show you why there’s a lot of good reasons for publishers to be enthusiastic about these rules.
So if you’re a publisher, I encourage you to read on to learn more about how you can help us have the best chance possible of capturing Events for your content.
What’s in it for you? Well, collecting this data helps to give everyone (Crossref, yourself, and others) a better picture of how your content is being used, including for altmetrics.
1. Please let us in
Please do open the door when we come knocking, we promise not to stay long. You can do this by allowing the User Agent
CrossrefEventDataBot to visit your site, and whitelisting it if necessary. The bot is how we visit URLs to confirm if they are for an item of content registered with us. The reason why we’re visiting your site could include:
- someone tweeted an article landing page
- someone discussed it on Reddit
- it was linked to from a blog post
The Bot has only one job: to work out the DOI. No information beyond this is stored. Whenever we become aware of a link that we think points to a DOI or an Article Landing Page, we follow it so we can collect the required metadata. Everything in Crossref Event Data is linked via its DOI, so it’s important that we can collect this information.
The bot will identify itself using the standard method. It sets two headers:
Once we confirm that a link points to registered content, we then log an Event for the DOI. You should expect our bot to visit no more than once or twice per second, although if there is a period of activity around your articles, you may see higher rates. The bot also takes a sample of DOIs and visits them to work out which domain names belong to our members, so it can maintain a list. This can happen every few weeks. You may see a small number of requests from the bot, but limited to one per second.
If we can’t enter your site to look for metadata though, then we won’t be able to collect Events for your DOIs. So by allowing our bot, you will be helping us to collect Event Data for your registered content.
If you’re worried about traffic on your site, consider sending us your mapping of article landing pages to DOIs. Because Resource URLs aren’t the same as article landing pages, we need more information than the DOI Resource URLs that you already send us.
If you’re running a blog or website (and you’re not a member of Crossref), you may also see our bot visiting, to look for links that comprise Events. Please allow us to visit, so we can record in our Event Data service the fact that your website links to registered content.
2. We ❤️ robots.txt
Robots.txt files are important and we ensure our Event Data Bot respects yours. If we are instructed not to visit a site, we won’t. So if you want us to visit your site in order to check the metadata of your article landing page, please ensure you provide an exception for our Bot, or make sure that you’re not blocking it. Check the restrictions in your file to see if we’re allowed to visit. This is just another way you can help us work for you.
3. Include the DC Identifier
Including good metadata is general best practice for scholarly publishing. When we visit a publisher’s site, we look for metadata embedded in the HTML document (such as DC.Identifier tags that, amongst other things, enable Crossmark to work).
By ensuring you include a Dublin Core identifier meta tag in each of your articles pages, our system can match your landing pages back to DOIs.
Here’s an example:
4. Let us in, even if we don’t bring cookies
We’re like that friend who turns up for dinner without bringing a bottle of wine. And we hope that you’ll be ok with that. Some Publisher sites don’t allow browsers to visit unless cookies are enabled and they block visitors that don’t accept them. If your site does this, we will be unable to collect Events for your DOIs. Allowing your site to be accessed without cookies will help give us the best chance of successfully reading your metadata.
5. We may not speak your language
If you want to pass this on to your friendly system administrator, the best practice is documented in full here: https://www.eventdata.crossref.org/guide/best-practice/publishers-best-practice/. And sorry about all the don’ts you’ll find on that page…. don’t let them curb your enthusiasm for taking Event Data out for a spin!