Paper No. 14
Presentation Time: 8:00 AM-8:00 PM
EVALUATING THE EFFECTS OF POSITIONING ERRORS ON THE ACCURACY OF SPECIES DISTRIBUTION MODELS USING SYNTHETIC DATA
Species distribution models (SDM) use presence field data and environmental variables to locate suitable regions for species occurrence. The majority of presence data, especially in tropical countries, are from herbariums and museum collections, which usually contain geographical positioning errors. Despite the use of global positioning systems (GPS), oftentimes there are difficulties in receiving the digital signal under forest cover which contributes to site coordinates inaccuracies. This study used synthetic data to evaluate the sensitivity of three SDM (BIOCLIM, GARP Best Subsets, and MAXENT) to positioning errors. The use of synthetic species allowed greater control over the experimental design including the scale and gaussian nature of the errors. A simulated fundamental niche was created for vegetal species based on a real Amazon tree using geographical information systems techniques. Coordinate displacements at four scales (10 km, 0.25º, 0.5º and 1º) were introduced using two methods: sample coordinates projection towards cellular center point, and errors with normal distribution in polar coordinates parameters. The first approach reflected changes in grain size and is commonly adopted when dealing with herbarium and museum database, whereas the second approach simulated the problem in obtaining a GPS signal under the forest canopy. The samples were divided into calibration (n=100) and validation (n=50) subsets and the positioning errors were evaluated using four measures: minimum area, confusion matrix (omission, commission errors), kappa index, and area under the curve (AUC) from Receiver Operating Characteristic plot. The performance of each model varied according to the assessment metric and type of error (cellular center projection/polar parameters). Regarding cellular center projection must be considered that some samples will aggregated if they fall into the same cell, therefore, the number of samples will decrease, which is a related factor on SDM performance. In considering the AUC, the BIOCLIM model exhibited the greatest decay in performance, whereas the MAXENT model the least decay. The GARP Best Subsets was determined to be the most resilient evidenced by the lowest proportional performance decrease. A common problem for all models was the increasing of predicted area, leading to higher commission errors. These results demonstrated that each model properties, knowledge of the dataset origin, and associated errors must be taken into consideration during the interpretation of results.