« June 2008 | Main | August 2008 »

July 29, 2008

Search Web Services - New Committee Drafts

As posted here on the SRU Implementors list, the OASIS Search Web Services Technical Committee has announced the release of five Committee Drafts, informally known as:

  1. Abstract Protocol Definition  (APD)
  2. Binding for SRU 1.2
  3. Auxiliary Binding for HTTP GET
  4. CQL 1.2
  5. Binding for OpenSearch
Links to specific document formats are given at the bottom of the mail. A list of the TC public documents is also available here.

The next phase of work for the TC will be the development of SRU/CQL 2.0, and the Description Language.

July 28, 2008

Five Years

Oh wow! A rather remarkable plea here from Dan Brickley on the public-lod mailing list which calls for the registrant of the dbpedia.org DNS entry to top it up with another 5+ years worth of clocktime. Some quotes:

"The idea of such a cool RDF namespace having only 6 months left on the DNS registration gives me the worries."

"If you could add another 5-10 years to the DNS registration I'd sleep easier at night."

"Let me stress I'm not suggesting that this domain is actually at risk. Just that the not-at-risk-ness isn't readily evident from a quick look in the DNS."

"Those in the know are probably confident this is all in hand, but as the SW gets bigger I suspect we ought to establish practices such as "vocabularies that seek global adoption should always have 5+ years on their DNS registries"."

Yes, and maybe those cool URIs should have kite marks, too. ;)

(Btw, for those who may not already know the maximum length of time that any DNS name may be leased out in a single registration is 10 years, see the FAQ put out by ICANN.)

So, pity the poor user of a given semantic web application who may not know what the expectancy is behind the nodes in an RDF graph of assertions. Shifting sands, indeed.

Does Size Matter?

Interesting post from Google, in which they say:

"Recently, even our search engineers stopped in awe about just how big the web is these days -- when our systems that process links on the web to find new content hit a milestone: 1 trillion (as in 1,000,000,000,000) unique URLs on the web at once!"
Puts CrossRef's 32,639,020 unique DOIs into some kind of perspective: 0.0033%. But nonetheless that trace percentage still seems to me to be reasonably large, especially in view of it forming a persistent and curated set.

Update: Talking of Google numbers, pingdom has a post "Map of all Google data center locations" with maps of US, Europe and World locations.

July 24, 2008

Knols and Citations Part II

Tony's post highlights Knol's "service" URIs. Another issue is that many Knol entries have nice long lists of unlinked references. The HTML code behind the references is very sparse.

Might the DOI be of use in linking out from these references? I think so. Then, of course, there's the issue of DOIs for Knols...

Knols and Citations

So, Google's Knol is now live (see this announcement on Google's Blog). There'll be comment aplenty about the merits of this service and how it compares to other user contributed content sites. But one curious detail struck me. In terms of citeability, compare how a Knol contribution (or "knol") may be linked to as may be a corresponding entry in Wikipedia (here I've chosen the subject "Eclipse"):

Knol
http://knol.google.com/k/jay-pasachoff/eclipse/IDZ0Z-SC/wTLUGw
Wikipedia
http://en.wikipedia.org/wiki/Eclipse
The Knol link includes author name, subject, and service gunk, while the Wikipedia link includes only the subject. That makes the Wikipedia link both more readily citeable as well as being to some degree discoverable. I wonder what Google's intentions, if any, are with respect to the citing of their pages (or "knols") as authoritative sources of information. They don't seem to be doing themselves many favours.

I am minded of this post on Jeff Young's Q6 which cites this passage from the HTTP spec (see RFC 2616, Sect. 3.2):

"As far as HTTP is concerned, Uniform Resource Identifiers are simply formatted strings which identify--via name, location, or any other characteristic--a resource."
URIs bearing these so-called "characteristics" are what I would call a service URI in contrast to a name URI (something that I will elaborate on in a separate post). For now, however, I would just note that the Knol URI looks more like a service URI and the Wikipedia URI more like a name URI. I know which URI form I would prefer to cite.

July 21, 2008

CrossTech By Numbers

CrossTech is two years old (less one month) and we have now seen some 145 posts. Breaking the posts down by poster we arrive at the following chart:
crosstech.png
Note this is not any real attempt at vainglory, more a simple excuse to play with the wonderful Google Chart API. Also, above I've taken the liberty of putting up an image (.png), although the chart could have been generated on the fly from this link (or tinyurl here).

What is of interest in the chart is that approximately 3/4 of the posts are by CrossRef members (TH, EN, RK) and 1/4 by CrossRef staff (EP, GB, AT, CK). Certainly CrossRef staffers are doing their bit for this blog. There's also way too many posts from me. It would be really interesting to see some others' views or observations per the CrossTech logo legend ("..., collaboration, ...").

I guess the real impediment is that one needs to request an account before posting. (Certainly there's no reason for any member to be shy about requesting an account and posting.) Note that I haven't considered the number of commentators to the blog which is larger than the number of posters. Also a number of CrossRef members are very active with their own blogs. Those blogs with a tech focus could (should?) be scooped up by a Planet style aggregator if there would be sufficient interest in maintaining a publishing technology hub.

One can only hope that the numbers will continue to grow (by direct posts or by aggregations) and that there will be a wider info share over the next couple of years.

Metadata Matters

Andy Powell has published on Slideshare this talk about metadata - see his eFoundations post for notes. It's 130 slides long and aims

"to cover a broad sweep of history from library cataloguing, thru the Dublin Core, Web search engines, IEEE LOM, the Semantic Web, arXiv, institutional repositories and more."
Don't be fooled by the length though. This is a flip through and is a readily accessible overview on the importance of metadata. Slides 86-91 might be of interest here. ;)


Library APIs

Roy Tennant in a post to XML4Lib announces a new list of library APIs hosted at

http://techessence.info/apis/
A useful rough guide for us publishers to consider as we begin cultivating the multiple access routes into our own content platforms and tending to the "alphabet soup" that taken together comprises our public interfaces.

July 9, 2008

PRISM Press Release

The PRISM metadata standards group issued a press release yesterday which covered three points:

PRISM Cookbook
The Cookbook provides "a set of practical implementation steps for a chosen set of use cases and provides insights into more sophisticated PRISM capabilities. While PRISM has 3 profiles, the cookbook only addresses the most commonly used profile #1, the well-formed XML profile. All recipes begin with a basic description of the business purpose it fulfills, followed by ingredients (typically a set of PRISM metadata fields or elements), and, closes with a step-by-step implementation method with sample XMLs and illustrative images."
PRISM 2.0 Errata
The Errata "addresses a range of issues, from editorial to technical, that have been reported by the PRISM user community."
PRISM 2.1
The next version of the PRISM Specification, PRISM 2.1, is slated for release in late 2008. "This release will address complex rights for multi-platform and global distribution channels."

July 8, 2008

Now What About XMP?

With PDF now passed over to ISO as keeper of the format (as blogged here on CrossTech), Kas Thomas (on CMS Watch's TrendWatch) blogs here that Adobe should now do the right thing by XMP and look to hand that over too in order to establish it as a truly open standard. As he says:

"Let's cut to the chase. If Adobe wants to demonstrate its commitment to openness, it should do for XMP what it has already done for PDF: Put it in the hands of a legitimate standards body. Right now it's open in name only. "
And this:
"Adobe is pushing the XMP standard ... at Adobe's pace and in ways that benefit Adobe. (The parallels with PDF are numerous and obvious.) There are lingering technical issues waiting to be solved, however. Issues whose solutions shouldn't have to be dependent on Adobe's needs only."
He's absolutely bang on. With XMP on the threshold of finally shining through we really could do with Adobe cutting it loose. It's time to leave home.

July 3, 2008

Q6

For anybody interested in the why's and wherefore's of OpenURL, Jeff Young at OCLC has started posting over on his blog Q6: 6 Questions - A simpler way to understand OpenURL 1.0: Who, What, Where, When, Why, and How. He's already amassing quite a collection of thought provoking posts. His latest is The Potential of OpenURL, from which:

"OpenURL has effectively cornered the niche market where Referrers need to be decoupled from Resolvers."
Blog has UML diags, definitions, musings, etc. - something for everybody. Definitely worth checking out.

ISO Standard for PDF

I blogged here back in Jan. 2007 about Adobe submitting PDF 1.7 for standardization by ISO. From yesterday's ISO press release this:

"The new standard, ISO 32000-1, Document management – Portable document format – Part 1: PDF 1.7, is based on the PDF version 1.7 developed by Adobe. This International Standard supplies the essential information needed by developers of software that create PDF files (conforming writers), software that reads existing PDF files and interprets their contents for display and interaction (conforming readers), and PDF products that read and/or write PDF files for a variety of other purposes (conforming products)."
Congrats to Adobe Systems!

July 1, 2008

Client Handle Demo

This test form shows handle value data being processed by JavaScript in the browser using an OpenHandle service. This is different from the handle proxy server which processes the handle data on the server - the data here is processed by the client.

Enter a handle and the standard OpenHandle "Hello World" document is printed. Other processing could equally be applied to the handle values. (Note that the form may not work in web-based feed readers.)

:


Exposing Public Data: Options

This is a follow-on to an earlier post which set out the lie of the land as regards DOI services and data for DOIs registered with CrossRef. That post differentiated between a native DOI resolution through a public DOI service which acts upon the "associated values held in the DOI resolution record" (per ISO CD 26324) and other related DOI protected and/or private services which merely use the DOI as a key into non-public database offering.

Following the service architecture outlined in that post, options for exposing public data appear as follows:

  1. Private Service
    1. Publisher hosted – Publisher private service
  2. Protected Service
    1. CrossRef hosted – Industry protected service
    2. CrossRef routed – Publisher private service
  3. Public Service
    1. Handle System (DOI handle) – Global public service (native DOI service)
    2. Handle System (DOI ‘buddy’ handle) – Publisher public service

(Continues below.)

Option #1 would make public data available through a private service at a publisher host based on the DOI. This places certain constraints on service discovery and persistence. Autodiscovery links can be placed into Web pages, but there is no opportunity to ‘embed’ the services into the DOI itself, and hence these cannot be considered native DOI services. Without a published API (and hence some degree of commitment from the publisher) the service access points (and possibly the services, too) are fragile.

Option #2 would require CrossRef to develop a service which would either a) deliver some public data on behalf of the publisher, or b) route requests through to a bespoke publisher service. Both options would require development at CrossRef and an upload mechanism for the publisher to pass along data or service address. Both options would be offered as a new member service and would thus likely be subject to membership policy arrangements. One should consider that there would be some restrictions on service operation. One possible restriction might be that this would be a one-time service registration at CrossRef and that any additional services would need to be added at the publisher end.

Option #3 uses the existing Handle System infrastructure and provides a public read service. There are two possibilities: a) add a record (or records) to the DOI handle, or b) add records to a DOI ‘buddy’ handle under publisher control. Both require further explanation:

Option #3a would require CrossRef consent. Unless these records (handle values) were registered by CrossRef there would be concerns over interoperability. That and security concerns would almost certainly require that CrossRef writes the record. But this would then need to be developed as per Option #2 above. And if a mechanism were put in place it could be restrictive in practice, e.g. not allowing additional records to be inserted (as already noted in Option #2).

Option #3b requires no prior CrossRef consent. It is an option available to publishers who run a handle server. This can best be viewed as deploying a platform (a DOI 'buddy' handle) for hosting service access points with an intention to upload into the DOI handle (effectively Option #3a) as common public services are developed. In short, a public service incubator. Meantime the platform provides for an independent deployment and multiple services can be added as required. An uplink from a so-called DOI ‘buddy’ handle to the DOI handle would be maintained, and also as CrossRef allows a down link from the DOI handle to the DOI ‘buddy’ handle (a ‘see also’ type link) could be established thus pairing off these two handles. (Of course, additional values whether held in the DOI resolution record or especially in an associated DOI 'buddy' record would be subject to common typing constraints for semantic interoperability.)

My personal feeling is that public data is best exposed via a public resolution record with no strings attached. That is the surest way to guarantee both data persistence and accessibility.