SEMANTIC WEB TECHNOLOGIES FOR VALUE ADDED SERVICES AT THE GFZ ISDC
The GFZ ISDC portal is providing retrievable geo-monitoring data, information and knowledge using a metadata based catalog system. While searchable data product related metadata are stored in tables, the product type dependent metadata are represented by NASA's Global Change Master Directory (GCMD) Directory Interchange Format (DIF) XML documents. The architecture, the operation and the metadata philosophy as well as the general and project related scientific background of the GFZ ISDC portal are described detailed in GFZ ISDC - Portal to geoscientific data, information and knowledge. GFZ ISDC - Portal to geoscientific data, information and knowledge Semantic Web technology shall be use in order to provide new and extended access to data, information and knowledge at the GFZ ISDC as well as to make references and correlations between different classes of metadata (documents) visible. In addition to these goals interoperable ISDC portal web services, like catalog service (CSW), map service (WMS) of discovery and registry services based on semantic relations shall be realized using standardized metadata documents, structures, concepts and languages, like OGC/ISO 191xxx, XML, RDF, SKOS or OWL. Multi and inter domain collaboration services are only realizable adding validated semantics in kind of controlled vocabularies, like NASA's universal GCMD science keywords and associated directory keywords or the marine data related NDG (Natural Environment Research Council Data Grid) vocabulary. In addition to organization/committee driven controlled vocabularies, general accepted free vocabularies (folksonomies) become more and more important.
Figure 1: Semantic Web layered architecture
Semantic web technologies are based on Universal Resource Identifiers (URI) and XML. The Resource Description Framework (RDF) and related RDF schema provide the techniques for the representation of semantics by ontologies. This approach is shown in the Semantic Web layered architecture, Figure 1.
Appropriate and standardized languages for the representation and processing of knowledge are OWL and SKOS. Tools for the creation, management and processing of ontologies are Protègè, Altova SemanticWorks, SWOOP, CmapTools. For the modelling of the new ISDC DIF* standard related metadata concept the program Cmap Tools has been used.
As mentioned already, the GFZ ISDC product type dependent metadata are stored in DIF standard compliant XML documents. One product type is described by one metadata document from the class product type DIF. Inside this class, there are eight mandatory attributes, like e.g. entry id, entry title and parameters and more than 25 optional attributes, like e.g. summary or reference. Analysing the structure of this class and according to the GCMD's approach there are some attributes which can be handled in extension as autonomous classes with own attributes too. This means in addition to the product type class new metadata classes and relations can be created. For the description of the unique and discrete data products (data files) an extension of the DIF standard is used, which is not subject of this paper, but is described in detail in DIF Metadata structure and handling at the GFZ ISDC.
Figure 2: ISDC DIF Metadata Concept
Figure 2 shows this expedient structure of the new ISDC DIF* standard related metadata concept. This concept consists of concept nodes, which are in our case metadata classes or attributes of classes, and relations between two concept nodes. These relations are visualized by concept arrows and the corresponding linking phrases. Two concept nodes and the appropriate relation are always reflecting one of the different ISDC metadata propositions. Even, the different shapes as well as the different colors of the concept nodes are representing specific features of our concept nodes, not all ISDC metadata classes, attributes and relations are represented in this concept. For Example, you cannot distinguish between mandatory and optional classes and attributes.
The main ISDC metadata class product type is located at the center of the concept. Important features of the product type are referenced by attributes, like parameter (science keyword), reference, citation, free keyword, entry title, summary and entry id. A proposition, consisting of the two concept nodes product type and entry id and the associated linking phrase has unique can be read in the following way: A product type has an unique identifier. Another proposition is: A product type has a citation, which reflects the relation between a producer of data products and the type of data products. The attribute reference related proposition is used for the relations between product type and the usage of related data products within scientific publications. Further main attributes are project, platform (observatory or instrument carrier), instrument (sensor), which are represented by autonomous classes too. In order to keep the concept clear and simple, such attributes like data center, personnel, quality and others are not shown here.
In addition to the ISDC metadata classes and its relations, the concept also represents concept nodes for controlled and free vocabularies, and the different sources of the vocabularies. The content of the product type attribute parameter as well as the content of the main attributes/classes, like project, platform and instrument are represented by controlled vocabularies. The ISDC metadata concept is using the GCMD's science keywords and associated directory keywords as controlled vocabularies. Sources of free vocabularies are data provider or user generated free keywords or keyword types which can be used in addition to controlled vocabularies. In our case the non DIF standard keyword type (new attribute of the class product type) application and related content is introduced by a user. The generation of general accepted free vocabularies can be supported via Web 2.0 technologies like collaborative and social tagging.
The new ISDC DIF metadata concept also can be used in order to create new and more abstract product type classes, which e.g. have a higher level scope. Such GFZ ISDC orbit product types like e.g. Rapid Science Orbit or Predicted Orbit or Precise orbit derived from different satellite missions (projects) can be described in a general way by the new product type Orbit. This not only allows to network the different unique product types but also to create a more generalize searchable index for orbit products within the GFZ ISDC portal in future.
References:
http://searchsoa.techtarget.com/sDefinition/0,,sid26_gci212372,00.html
http://www.controlledvocabulary.com/
http://infomesh.net/2001/swintro/
http://en.wikipedia.org/wiki/Semantic_Web
http://www.w3.org/TR/skos-primer/
http://www.jisc.ac.uk/uploaded_documents/jisctsw_05_02bpdf.pdf