HETEROGENEOUS DATA INTEGRATION FOR A GLOBAL MAPPING OF SEABED SUBSTRATES
The integration extends across numeric and linguistic data types and across point, vector and raster data topologies. Different methods have to be used for each form of incoming data, though repetitions of data types (and standards) do give efficiencies.
Certain design principles were the key to success. Outputs are targeted to a stable ‘top-20’ of parameters which nearly all clients require: grainsize, composition, color, strength. The input data are held as structured documents which preserve the structure of the incoming data, but are computable to produce integrated outputs with applied QAQC. This gives huge efficiencies – essential for data aggregation projects on this scale. It is also versatile - the software is adjusted to improve or extend outputs.
Thousands of variant parameters exist in the incoming data: e.g., for grain size - medians, averages, deviations (graphic and moment), modes, and fractional percentages. Therefore we nominate standards (e.g., “metre platinum bar”) for the system output parameters and calibrate inputs to them. The inputs are accepted if the statistical difference falls within acceptable uncertainties, else they are rejected.
The word-based data is exceedingly valuable and abundant for seabed characterization. We apply a dictionary that uses fuzzy set theory to give meanings to terms. The linguistic descriptions are treated as arithmetic expressions and summed to give estimates of selected seabed properties (texture, components, color, strength).