June 10, 2009

XMP Primer

xmp-primer.jpg

There's a new XMP Primer (PDF) by Ron Roskiewicz (ed. Dianne Kennedy) available from XMP-Open. This is copyrighted 2008 but I only just saw this now. This is a 43 page document which provides a very gentle introduction to metadata and labelling of media and then introduces XMP into the content lifecycle and talks to the business case for using XMP. The primer covers the following areas:

  • Introduction to Metadata
  • Introduction to XMP
  • XMP and the Content Lifecycle
  • XMP in Action; Use Cases
  • Additional XMP Resources

One small gripe would be that this seems to have been prepared for US letter-sized pages and although is printable on A4 there is the slightest of clippings on the right-hand margin with no real loss of information but it does confer a sense of "incompleteness". Really there can be little excuse these days for this parochialism. Also, for a document talking up the benefits of using XMP, it's decidedly odd that it doesn't make use of XMP itself - or rather there is a default XMP packet in the PDF with no real useful properties such as title, author, or date. Could have been a nice little object lesson in using XMP.

June 05, 2009

Aligning OpenSearch and SRU

[Update - 2009.06.07: As pointed out by Todd Carpenter of NISO (see comments below) the phrase "SRU by contrast is an initiative to update Z39.50 for the Web" is inaccurate. I should have said "By contrast SRU is an initiative recognized by ZING (Z39.50 International Next Generation) to bring Z39.50 functionality into the mainstream Web".]

[Update - 2009.06.08: Bizarrely I find in mentioning query languages below that I omitted to mention SQL. I don't know what that means. Probably just that there's no Web-based API. And that again it's tied to a particular technology - RDBMS.]

queryType.png
(Click image to enlarge.)

There are two well-known public search APIs for generic Web-based search: OpenSearch and SRU. (Note that the key term here is "generic", so neither Solr/Lucene nor XQuery really qualify for that slot. Also, I am concentrating here on "classic" query languages rather than on semantic query languages such as SPARQL.)

OpenSearch was created by Amazon's A9.com and is a cheap and cheerful means to interface to a search service by declaring a template URL and returning a structured XML format. It therefore allows for structured result sets while placing no constraints on the query string. As outlined in my earlier post Search Web Service, there is support for search operation control parameters (pagination, encoding, etc.), but no inroads are made into the query string itself which is regarded as opaque.

SRU by contrast is an initiative to update Z39.50 for the Web and is firmly focussed on structured queries and responses. Specifically a query can be expressed in the high-level query language CQL which is independent of any underlying implementation. Result records are returned using any declared W3C XML Schema format and are transported within a defined XML wrapper format for SRU. (Note that the SRU 2.0 draft provides support for arbitrary result formats based on media type.)

One can summarize the respective OpenSearch and SRU functionalities as in this table:

Structure OpenSearch SRU
query no yes
results yes yes
control yes yes
diagnostics no yes

What I wanted to discuss here was the OpenSearch and SRU interfaces to a Search Web Service such as outlined in my previous post. The diagram at top of this post shows query forms for OpenSearch and SRU and associated result types. The Search Web Service is taken to be exposing an SRU interface. It might be simplest to walk through each of the cases.

(Continues below.)

Continue reading "Aligning OpenSearch and SRU" »

May 30, 2009

Search Web Service

search-web-service.png
(Click image to enlarge graphic.)

While the OASIS Search Web Services TC is currently working towards reconciling SRU and OpenSearch, I thought it would be useful to share here a simple graphic outlining how a search web service for structured search might be architected.

Basically there are two views of this search web service (described in separate XML description files and discoverable through autodiscovery links added to HTML pages):

One can see at a glance that there's more happening down in the SRU layer. The SRU layer implements a heavyweight, robust service which provides a detailed listing of search indexes and index relations in the description document ('SRU Explain'), is searchable using a standard query grammar - CQL ('Contextual Query Language'), responds with result sets inside a standard XML wrapper and expressed as an XML record set (e.g. PAM) that is validatable using W3C XML Schema, and makes available a full roster of diagnostics.

By contrast the OpenSearch layer provides a lightweight view onto the search web service in which a simple opaque query string is sent to the server and a simple XML result set returned (usually RSS or Atom). Again a description document is made available ('OpenSearch Description') but this is much more coarse grained than the SRU description - e.g. it does not specify query components such as indexes or relations.

In practice, both views can be provided for by the same search web service. While OpenSearch does not specify any structured query it can make use of a CQL packaged query. That is, a single parameter value for the OpenSearch 'query' parameter can be unpacked by a CQL parser to yield a complex search query. The search query does not need to be splattered all over the URL querystring which is already using its parameter set to provide control information for the search (e.g. pagination, encoding and the like).

And how would this relate to existing platform-hosted search services? Well, such services are usually bound to the host platform and are not intended to support remote applications. A search web service, on the other hand, would be ideally suited to offering direct support for running structured searches on platform-hosted content using off-platform apps.

Structured Search Using PRISM Elements

We just registered in the SRU (Search and Retrieve by URL) search registry the following components:

Context Sets
Schemas
This means that an SRU (Search and Retrieve by URL) search engine that supported one of the PRISM context sets registered above could accept CQL (Contextual Query Language) queries such as the following:
  1. prism.doi = "10.1038/nature05398"
  2. prism.publicationName = "Nature" and prism.volume = "444" and prism.number = "7119" and prism.startingPage = "E9"
  3. dc.identifier = "doi:10.1038/nature05398"
  4. dc.creator = "Jones-Smith" and prism.publicationName = "Nature" and prism.publicationDate > "2006-01-01"
  5. dc.title any "fractal pollock" and prism.publicationName = "Nature" sortBy prism.publicationDate/sort.descending
  6. "fractal anlysis" and prism.publicationDate within "2005-01-01 2008-12-31" sortBy dc.creator/sort.ascending
(Note that the quotes are only needed above for the DOI strings which contain a "/" character. Otherwise they are optional in the above examples.)

Any query such as one of the above (here #1) could be sent to the server on a querystring like so:

?version=1.1&operation=searchRetrieve&query=prism.doi=%2210.1038/nature05398%22
and if the server were also equipped to respond with PAM (PRISM Aggregator Message) format for result records, a response might look like this:

fractal-analysis-pam.jpg
PAM was discussed here earlier.


Such a structured response would provide the metadata elements for applications to build various interfaces into the original article:
fractal-analysis.jpg
We think that these PRISM components (context sets and schemas) will be useful for structured search of scholarly publications.

May 26, 2009

OAI-ORE: Workshop Slides

This is a very slick presentation by Herbert Van de Sompel on OAI-ORE which he's due to give today for a workshop at the INFORUM 2009 15th Conference on Prrofessional Information Resources in Prague. It's on the long side at 167 slides but even if you just flip though or sample it selectively you'll be bound to come away with something.

Describing aggregations of resources is a subject that really has to be of interest to CrossRef publishers.

Recently Commented On

Powered by
Movable Type 3.2