A P-Curve Exercise That Might Restore Some of Your Faith in Psychology
I teach my university's Graduate Social Psychology course, and I start off the semester (as I assume many other professors who teach this course do) by talking about research methods in social psychology. Over the past several years, as the problems with reproducibility in science have become more and more central to the discussions going on in the field, my introductory lectures have gradually become more dismal. I've come to think that it's important to teach students that most research findings are likely false, that there is very likely a high degree of publication bias in many areas of research, and that some of our most cherished ideas about how the mind works might be completely wrong.
In general, I think it's hard to teach students what we have learned about the low reproducibility of many of the findings in social science without leaving them with a feeling of anomie, so this year, I decided to teach them how to do p-curve analyses so that they would at least have a tool that would help them to make up their own minds about particular areas of research. But I didn't just teach them from the podium: I sent them away to form small groups of two to four students who would work together to conceptualize and conduct p-curve analysis projects of their own.
I had them follow the simple rules that are specified in the p-curve user's guide, which can be obtained here, and I provided a few additional ideas that I thought would be helpful in a one-page rubric. I encouraged them to make sure they were sampling from the available population of studies in a representative way. Many of the groups cut down their workload by consulting recent meta-analyses to select the studies to include. Others used Google Scholar or Medline. They were all instructed to follow the p-curve manual chapter-and-verse, and to write a little paper in which they summarized their findings. The students told me that they were able to produce their p-curve analyses (and the short papers that I asked them to write up) in 15-20 person-hours or less. I cannot recommend this exercise highly enough. The students seemed to find it very empowering.
This past week, all ten groups of students presented the results of their analyses, and their findings were surprisingly (actually, puzzlingly) rosy: All ten of the analyses revealed that the literatures under consideration possessed evidentiary value. Ten out of ten. None of them showed evidence for intense p-hacking. On the basis of their conclusions (coupled with the conclusions that previous meta-analysts had made about the size of the effects in question), it does seem to me that there really is license to believe a few things about human behavior:
(1) Time-outs really do reduce undesirable behavior in children (parents with young kids take notice);
(2) Expressed Emotion (EE) during interactions between people with schizophrenia and their family members really does predict whether the patient will relapse in in the successive 9-12 months (based on a p-curve analysis of a sample of the papers reviewed here);
(3) The amount of psychological distress that people with cancer experience is correlated with the amounts of psychological distress that their caregivers manifest (based on a p-curve analysis of a sample of the papers reviewed here);
and
(4) Men really do report more distress when they imagine their partners' committing sexual infidelity than women do (based on a p-curve analysis of a sample of the papers reviewed here; caveats remain about what this finding actually means, of course...)
I have to say that this was a very cheering exercise for my students as well as for me. But frankly, I wasn't expecting all ten of the p-curve analyses to provide such rosy results, and I'm quite sure the students weren't either. Ten non-p-hacked literatures out of ten? What are we supposed to make of that? Here are some ideas that my students and I came up with:
(1) Some of the literatures my students reviewed involved correlations between measured variables (for example, emotional states or personality traits) rather than experiments in which an independent variable was manipulated. They were, in a word, personality studies rather than "social psychology experiments." The major personality journals (Journal of Personality, Journal of Research in Personality, and the "personality" section of JPSP) tend to publish studies with conspicuously higher statistical power than do the major journals that publish social psychology-type experiments (e.g., Psychological Science, JESP and the two "experimental" sections of JPSP), and one implication of this fact, as Chris Fraley and Simine Vazire just pointed out is that the former set of experiment-friendly journals are more likely, ceteris paribus, to have higher false positive rates than is the latter set of personality-type journals.
(2) Some of the literatures my students reviewed were not particularly "sexy" or "faddish"--at least not to my eye (Biologists refer to the large animals that get the general public excited about conservation and ecology as the "charismatic megafauna." Perhaps we could begin talking about "charismatic" research topics rather than "sexy" or "faddish" ones? It might be perceived as slightly less derogatory...). Perhaps studies on less charismatic topics generate less temptation among researchers to capitalize on undisclosed researcher degrees of freedom? Just idle speculation...
(3) The students went into the exercise without any a priori prejudice against the research areas they chose. They wanted to know whether the literatures the focused on were p-hacked because they cared about the research topics and wanted to base their own research upon what had come before--not because they had read something seemingly fishy on a given topic that gave them impetus to do a full p-curve analysis. I wonder if this subjective component to the exercise of conducting a p-curve analysis is going to end up being really significant as this technique becomes more popular.
If you teach a graduate course in psychology and you're into research methods, I cannot recommend this exercise highly enough. My students loved it, they found it extremely empowering, and it was the perfect positive ending to the course. If you have used a similar exercise in any of your courses, I'd love to hear about what your students found.
By the way, Sunday will be the 1-year anniversary of the Social Science Evolving Blog. I have appreciated your interest. And if I don't get anything up here before the end of 2014, happy holidays.