METADATA AND SEMANTICS IN THE ASTRONOMICAL VIRTUAL OBSERVATORY
* The astronomical VO: contexts and goals
The astronomical VO - which we can take to cover traditional night-time astronomy, plus radio, X-ray and solar astronomy (plus a little STP) - is characterised by a large number of independent image and catalogue archives at all size scales from megabytes to (soon) petabytes, both well-supported and semi-formal, which nonetheless have significant overlaps in terms of file formats, coordinate systems, and objects of interest: both X-ray and radio astronomers will be interested in a supernova remnant, both will refer to it with a right ascension and declination, and both will be able to produce a FITS file containing relevant data. The range of resources available is large enough that scientists may not be aware of all the resources that might be of use to them, nor how to use interesting resources in different wavelength ranges.
This shared technology and shared interest means that the astronomical VO should be in a prime position to take advantage of VO ideas, and to make significant progress towards domain-wide interoperability. The range of archive sizes means that this VO has notable data discovery problems, and significant social problems in bringing a wide range of actors to common agreement.
* IVOA responses
The International Virtual Observatory Alliance (IVOA) is a consortium of consortia, acting as a coordination point for multiple national astronomy VO projects, with a process explicitly modelled on that of the World Wide Web Consortium (W3C).
The IVOA has been successful, both in brokering high- and low-level agreements on protocols and formats, and (a softer, but equally challenging process) in establishing itself as the single forum which coordinates this VO, acting as a nursery for other spinoff VO developments. I will review the history of this process.
* Semantic technologies within the IVOA
The IVOA's principal metadata outputs have been the establishment of a VO-wide resource registry, a small set of common and extensible data models, and some more sophisticated semantic experiments. The registry records image, catalogue and service resources, and required appropriate metadata schemas, registry infrastructure, and clients. The data models so far agreed cover coordinate systems and catalogue coverage information. Both models were substantially harder to agree than was initially expected, for interesting and informative reasons.
The semantic technologies explored to date include a basic type system for astronomical data (Unified Content Descriptors, or UCDs), an ontology of astronomical object types, the early stages of a system for linking serialised data to data models, and the development of interoperable controlled vocabularies.
* Other projects
I will also briefly discuss other semantically-oriented projects working with IVOA technologies.
The Explicator project aims to avoid the expense and complication of creating consensus data models by helping data centres make their data available in a data model which is natural to them (for which we can read 'inexpensive'), and declaring mappings to well-known data models.
The project Semantic Knowledge Underpinning Astronomy (SKUA) is prototyping a distributed network of semantically aware shared annotated services (in the form of RDF stores). This semantic layer will support a cluster of applications which will either directly support users in finding and recovering useful resources, or indirectly support them by supporting user-facing applications, including a Facebook-like astronomical Virtual Research Environment (VRE).
List of abbreviations:
FITS = Flexible Image Transport System
IVOA = International Virtual Observatory Alliance
RDF = Resource Description Framework
SKUA = Semantic Knowledge Underpinning Astronomy
STP = Solar-Terrestrial Physics
UCD = Unified Content Descriptors
VO = Virtual Observatory
VRE = Virtual Research Environment
W3C = World Wide Web Consortium