GSA Annual Meeting in Denver, Colorado, USA - 2016

Paper No. 97-11
Presentation Time: 11:00 AM

EVIDENCE FOR VALIDITY AND RELIABILITY IN GEOSCIENCE EDUCATION RESEARCH: AN EXAMPLE FROM THE INTEGRATE RESEARCH PROJECT


CZAJKA, C. Doug, Marine, Earth, and Atmospheric Sciences, North Carolina State University, Raleigh, NC 27695 and MCCONNELL, David A., Marine, Earth and Atmospheric Sciences, North Carolina State University, Raleigh, NC 27695, cdczajka@ncsu.edu

An important part of educational research design is choosing how to measure change and deciding which tools are most appropriate for the research questions of interest. When choosing tools to collect data and analyze results, it is important to consider the factors of reliability and validity. Many educational research papers cite gains on exam questions or concept inventories as evidence of the efficacy of an instructional intervention, but were these measures reliable and the claims made valid?

We will discuss the concepts of validity and reliability by applying them to an example from a research project as part of the InTeGrate (http://serc.carleton.edu/integrate/index.html) project. The goal is to test the efficacy of student-centered instructional materials that were designed to teach geoscience in the context of societal issues. Part of the project aims to measure student learning gains in non-InTeGrate courses versus those largely composed of InTeGrate teaching materials. To measure learning gains, a sixteen question selected response pre-posttest was constructed using questions from the Geoscience Literacy Exam (GLE) and the Geoscience Concept Inventory (GCI). Additionally, two post-course essay questions from the GLE were also administered. Based on this study example, we will discuss the various types of evidence for validity and how they impact the interpretations and conclusions drawn from results. Using preliminary results from the study, measures of reliability will be discussed and reported, including internal consistency, reliability of gains, and inter-rater reliability.