GSA Annual Meeting in Denver, Colorado, USA - 2016

Paper No. 97-10
Presentation Time: 10:45 AM

HOW IMPORTANT IS THAT STATISTICALLY SIGNIFICANT RESULT? BEING MINDFUL OF P-VALUES IN SCIENCE EDUCATION RESEARCH (Invited Presentation)


STEMPIEN, Jennifer, Department of Geological Sciences, University of Colorado at Boulder, 2200 Colorado Ave, Boulder, CO 80309, stempien@colorado.edu

Discipline based education research projects, especially within the STEM fields, have evolved from studies within a single classroom to large efforts that can involve multiple semesters, faculty and even institutions. With the increase in the size and duration of educational studies and the increasing complexity of the resulting datasets, many science education researchers have turned to quantitative and mixed methods for data analysis and interpretation. Moreover the call for reproducible results and educational decisions to be made using evidence based and validated best practices have added to the need for proof that science education efforts are effective. The use of p-values to validate ideas and research through null hypothesis significance testing (NHST) has become prevelant in education research, impacting interpretation and even if some results are deemed publishable.

However the concept of the p-value, and how to properly use and interpret them, has been highly contested within statistics and educational research for over a hundred years. Within the past five year alone multiple high profile stories from biomedical to political science have highlighted studies that are significant using NHST, yet the interpretations are faulty in context, p-values can change when studies are reproduced, and others are “p-hacked” into getting what are viewed as statistically significant results. These issues with NHST are also found within discipline based education research, especially as each discipline has its own statistical culture. This does not mean that NHST is a faulty approach that should be avoided. Instead discipline education researchers need to be more mindful about the design of the project, the specific hypothesis and/or statistical model being tested, wary of ad hoc multiple testing, incorporate other statistics such as effect size and confidence intervals to support the effectiveness of their study, and not overemphasize or over interpret p-values.