GSA Annual Meeting in Seattle, Washington, USA - 2017

Paper No. 206-6
Presentation Time: 9:20 AM

APPLYING TEXT AND DATA MINING TO GEOLOGICAL ARTICLES: TOWARDS COGNITIVE COMPUTING ASSISTANTS


CLEVERLEY, Paul Hugh, Information Management, Robert Gordon University, 12 Nelson Close, Wallingford, OX10 0LG, United Kingdom, p.h.cleverley@rgu.ac.uk

Text and Data Mining (TDM) is the use of automated analytical techniques to analyse text and data for patterns, trends and other useful information. Cognitive Computing combines this with Natural Language Processing (meaning) and Machine Learning (prediction) to ‘mimic’ human thought processes to augment decision making.

Whilst expert systems rely on pre-defined subject matter associations and are more susceptible to human cognitive biases, data driven Cognitive Computing apps have the potential to generate new knowledge. The existing TDM in Geosciences literature focuses on addressing a priori hypotheses, counting the frequency of specific concept occurrences within text and visualizing the results spatially and by Geological Time. Few studies have examined capabilities for prediction and serendipity, taking the geoscientist off the beaten track.

Set Theory, word vector and neural network techniques were applied using Python to 100,000+ articles from the Society of Petroleum Engineers, Geological Society of London, American Geosciences Institute and Society of Economic Geologists (via GeoScienceWorld). This led to the generation of stimulants with ‘surprising’ associations to given search terms and Geological Formation analogues. Mixed methods data was collected from the interaction of 53 geoscientists with the stimulants in two Oil & Gas organizations. Integration with environmental and health databases (e.g. NOAA) was also explored.

The results showed that discriminant search term word co-occurrence facilitates serendipity (unexpected, insightful and valuable information encounters) to a greater extent than techniques deployed in existing search tools in the organization surveyed. The Geological Formation analogue instrument surfaced interesting analogues that were hitherto not known by experienced geologists.

Designing for serendipity may not be a widespread design assumption for enterprise search tools used by geoscientists, despite the scientific and commercial benefits. Search task driven Cognitive Computing apps may stimulate creativity and be particularly useful to suggest geoscience analogues. The integration of word vectors with existing data in structured databases may have the potential to yield new scientific discoveries and provides an area for further research.