A SYSTEM FOR FAST SPATIAL SEARCHES ON THE EARTH OR SKY USING THE HIERARCHICAL TRIANGULAR MESH
We address two separate mathematical issues. The first is about the correct abstract mathematical representation of complex spherical regions, and about implementations of various set operations (union, intersection and difference), morphological functions (dilation and erosion) and scalar functions (area). Beyond the optimal algorithms for these operations there are also serious challenges related to numerical precision (are these two planes exactly parallel?). The other part is a discrete pixelization of the sphere. For fast spatial searches we use data structures suitable for fast search algorithms. We use the sphere-quadtre to build a hierarchy of triangle-shaped pixels (trixels) on the globe, thus the name: Hierarchical Triangular Mesh, HTM.
There are many ways to represent shapes on the surface of the unit sphere. We chose a system, where there are no singularities at any pole, employing three-dimensional Cartesian unit vectors to describe locations on an idealized (unit radius) sphere. The most basic building block is the halfspace, a directed plane that splits the 3-D space into two halves. It is parameterized by a normal vector n and a scalar c (distance from the origin) When intersected with the unit sphere, a halfspace describes a spherical cap. Inclusion of a point inside such a cap is decided by a simple dot-product computation.
A so called "convex" is the intersection of many halfspaces. A rectangle is described by a convex of four halfspaces, the sides of the rectangle. A region, the most general shape is simply a union of convexes.
The normal form of a region is therefore a union of intersections. Very often the shapes are more conveniently represented by other well-know primitives, such as rectangles, spherical polygons, circles, the convex hull of a point set, and we provide a full compliment of shape functions that convert descriptions using these familiar terms into the normal form.
As always, the devil is in the details. Floating-point computations are subject to rounding errors and a number mathematical issues arise from these inherent uncertainties in our project, as well. What two points or halfspaces should be considered identical? When is a point exactly on an arc? Computer geometry libraries working in Euclidean space sometimes avoid these issues by utilizing exact arithmetic mathematical formulas on fractional numbers represented by integer numerators and denominators; however this slower workaround is not an option for solving the spherical geometry due to the normalization constraint, which involves taking the square root of the coordinates. We use the IEEE-754 standard, double precision numbers and derive the limitations from the actual number representation. At the core of many formulas is the cross-product to find perpendicular directions. Co-linear vectors have vanishing cross-products and numerically this can be tested in a robust way by comparing the largest coordinate to the double precision. If it is too small, its square root cannot be taken to normalize the vector and the indefinite perpendicular direct means co-linearity. A similar problem is solving for the roots of two circles on the sphere. If the computation is inaccurate the roots will not be on the circles numerically. The sweet spot for double precision numbers is at about the tolerance level of 10^-8 radians, which corresponds to a few thousandths of an arc second. This is about a foot in size on the surface of the Earth.
There are two basic kinds of spatial constraints in a query. 1: "Is this object within some distance of a target?" 2: "Is this object inside some region?"
Both kinds involve costly geometrical computation, so we devised a mechanism that eliminates objects from consideration quickly. Crude boxing techniques that selects candidates by boxing Lat,Lon values work well only if the region of interest is shaped like a rectangle aligned with the Lat/Lon grid, but is less effective for other shapes, like those that contain the pole, or narrow stripes that have a high inclination to the primary great circles. The main idea is that we implement a coarse filter whose job is to reject all objects that are certain to fail the spatial constraint, and use the fine filter for only the few false positives. The goal is to make the coarse filter as good as possible but also as fast as possible.
In a database which has a user-defined function called AngularDistance and a table of Objects that have columns named ObjID, Lat and Lon, a simple query that selects objects within half a degree of a point Lon = 12, and Lat = -30 is as follows:
select ObjID from Objects o where o.AngularDistance(12, -20, o.Lon, o.Lat) < 0.5
The costly AngularDistance function would be evaluated for each object in the table. Instead, if we implemented the coarse filter as a function that produces a table that would be efficiently computable on the fly, then the query is augmented with a join as follows,
select ObjID from Objects o join CircleCover(12, -20, 0.5) c on o.HtmId between c.lo and c.hi AND AngularDistance(12, -20, o.Lon, o.Lat) < 0.5
The second example presumes the existence of a HtmID column and the CircleCover table-valued function that returns rows of low, high values that constrain the possible HtmId values. Only those objects that pass the first constraint are given to AngularDistance for the precise calculation. The function CircleCover here produces a table that represents a circular region of interest centered on the given location and radius. However, our methods are capable of expressing arbitrary shapes with the full richness of the Region-Convex-Halfpsace paradigm. The essence of the idea is that the inside of every trixel is turned into a range query in a database over the HtmId values. We can exploit the relational database engine's ability to use the HtmID as an index for very rapid operation.
With the advent of modern silicon-based detectors, the way observational science is done is changing rapidly. To take full advantage of the avalanche of data, we need scalable information systems that provide reliable access via interoperable interfaces and efficient search capabilities. For a low-cost scientific data analysis infrastructure, all of the above components are required at the same time. Our approach is to wed state-of-the-art database and internet technologies to novel algorithmic developments to enable fast and seamless access to collaborative science archives. The framework serves both Earth and Space Sciences community.