OPTIMIZING A REFERENCE SPECTRAL DATABASE FOR RAMAN MINERAL IDENTIFICATION
A spectral database optimized for Raman mineral identification is being created via the following methods. The starting point is 8629 unoriented spectra representing 2216 mineral species downloaded from RRUFF along with metadata such as analytical conditions, sample identification status, chemical analysis, and so on. Each spectrum was numerically processed to extract maximum peak height and RMS continuum noise and therefore a S/N ratio. In addition, a cosine vector-similarity (VS) metric was calculated for every pair of spectra (distinct samples and/or distinct laser wavelength) from the same mineral species. The S/N data is being used to identify low S/N spectra that need to be excluded. The VS metric is being used in two ways. Low values of the metric are used to identify spectra that constitute duplicates which can be excluded. High values of the VS metric are being used to identify sample-level problems such as fluorescence, impure samples, metamictization, species mis-identification, and so on. Finally, work is under way to identify gaps in the coverage of rock-forming and economically important minerals including gaps in the coverage of geologically important solid-solution series. Additional samples and additional Raman spectra will be obtained in order to fill these gaps.