Bushfires and Open Science: A View From Australia

Our Family Summer in a Time of Fires

We’re just back home after a couple of weeks at the big old family beach house. We had one stinking hot day, 40+ degrees, but, strangely, other days were cool to cold, usually with strong swirly winds. So different from the long spells of searing heat further out to the East. We had only a few beach visits but lots of indoor games and gang self-entertainment by the kids. People came and went, but usually it was a pleasantly chaotic mob of 15 or so people. We watched the Test cricket–beating the Kiwis, yay! And, like the rest of the world, were aghast to see the pictures and hear the reports of the enormous fires up and down the East and South-East coasts.

We were at Anglesea, one of a string of small towns along the Great Ocean Road to the west of Melbourne. There have been big fires down that way in the past, but this year, so far, nothing major, although the peak months of Feb and March are still ahead. We not only enjoyed a family holiday, but–new these last couple of years–also kept a careful eye on the sky to the West, and on the excellent emergency app that pings when any warning is issued for any ‘watch zone’ you care to nominate. So we kept our phones charged and didn’t forget to check the car had a full tank, and drinking water and blankets on board–and we reminded ourselves where the two evacuation areas in the town are, and how we can most quickly get there. And what we musn’t forget if we have to load and leave quickly. Note to self: Find the little old battery radio, to keep nearby to hear emergency warnings if the electricity and phone reception both die.

We drove home through smoke haze, two hours, headlights on, never quite adapting to the stinging smell of smoke. The smoke came, we were told, not from the Victorian fires a few hundred km to the East, but from fires in Tasmania a thousand km to the South. Some of the fires have been burning for months, especially up north, and Sydney has had days of dangerous smoke at various times these last several months. Most ominously, it seems that these fires are on track to increase Australia’s already scandalously high carbon emissions by more than 50% over this July-to-June period, and very likely by 100% or more. More than double! A tipping point, anyone?

Yes, life on this planet is changing and we’re all in it together. But what has this to do with Open Science?I think we can learn by listening to climate scientists–not only about the carbon emission reductions we should have been doing 30 years ago and must, with great urgency, be doing now, but also about what good research practice looks like, under pressures that most of us can only imagine.

BTW, I’ll mention The Conversation (‘Academic rigour, journalistic flair’), which is running what seem to me an outstanding series of pieces on the fires. This week, see the Mon, Tues, Wed, and today’s offerings.

Climate Scientists at Work–Implications for Open Science

The best I’ve read recently is Gergis, J. (2018) Sunburnt Country. Carlton, Vic, Australia: Melbourne University Press.

In the acknowledgements, the author first thanks the teachers in the professional writing course she undertook before writing the book. The training shows–her writing is beautiful and compelling. Gergis tells the fascinating story of her research on the history of climate in Australia, going back centuries and more. She draws on data series from ice cores, tree growth rings, and coral growth rings, as well as evidence from early indigenous people and European explorers. She assembles overwhelming evidence that Australia is getting hotter and drier and, what’s more, is now experiencing weather extremes far beyond natural variability. And that human activities over the last century and more are by far the most important cause of global heating.

Comparable analyses, bringing together a wide range of data and applying a number of climate models, have been carried out for the Northern Hemisphere, but the work of Gergis and her many colleagues is a first for the Southern Hemisphere. The stories in the two hemispheres are roughly parallel, yet different, most notably by showing evidence that the toxic effect of industrialisation started later down south, not until about the mid-19th-century.

I’ll mention several aspects of this–and much other–climate research:

Integration of evidence–similar measures

We are becoming used to using meta-analysis to integrate results based on similar studies using the same or similar measures. Climate scientists do this routinely, sometimes on a very large scale as when different time series of e.g. CO2 concentration are integrated.

Integration of evidence–diverse measures

We sometimes talk about converging approaches as providing a powerful strategy for establishing a robust finding. See for example pp. 414-416 in ITNS. Climate scientists do this on a massive scale, as when multiple data series of quite different types are brought together to build an overall picture of change over time.

Using multiple quantitative models

Some fields in psychology use quantitative modelling, and its use is spreading slowly across the discipline. In stark contrast, the development of highly complex quantitative models is core business for climate researchers. Psychologists might feel that human behaviour, cognition, and feelings are about as complex as anything can get, but climate scientists are attempting to model and understand a planet’s biosphere, whose complexity is approaching comparable, I think. One of our arguments for the new statistics is that estimation is essential for quantitative modelling, so using estimation is an essential step towards a quantitative discipline. Gergis and her team apply multiple climate models to their diverse data sets to account for what has happened in the past and provide believable predictions for the future. You can guess that these predictions are scary beyond belief, unless our practices change drastically and very soon.

Open data and analysis code

It seems to be taken for granted that data sets are openly available, at least to all other researchers. The same for the models themselves, and the analyses carried out by any research group.

Reproducibility and peer scrutiny

Gergis describes how, again and again, any analysis or evaluation of a model by her group is subjected to repetition and intense scrutiny, at first by others within the group, then by other researchers. Only after all issues have been dealt with, and everything repeated and re-examined once more, is a manuscript submitted for publication. Which leads to further exacting peer scrutiny, and possible revision, before publication. It’s exhausting and sobering to read about such a painstaking process.

The Dark Side

You probably know about the hideous harassment meted out to climate researchers. Any aspect of their work can be attacked, with little or no justification. Vitriolic attacks can be personal. They may have to spend vast time and emotional energy responding to meaningless legal or other challenges. This toxic environment is no doubt one reason for the intense scrutiny I described above. We know all that, but even so it’s moving and enraging to read the trials that Gergis and her group have endured.

Large is Good? Small is Good?

Discussions in psychology of p-hacking and cherry-picking almost always assume that large is good: Researchers yearn for effects sufficiently large to be interesting and to gain publication. Open Science stipulates multiple strategies to counter such bias. Climate science offers an interesting contrast: Climate scientists tend to scrutinise their analyses and results, dearly hoping that they have slipped up somewhere and that their estimates of effects are too large. Surely things can’t be this bad? So back they go, checking and double-checking, if anything making their analyses and conclusions more conservative, rather than–as perhaps in psychology–exaggerated.

Researchers Have Emotions too

Of course many psychologists are emotionally committed to their research–we can feel disappointment, frustration, and–with luck–elation. I suspect, however, that rarely are these emotions likely to match the strength of emotions that climate scientists sometimes experience. Besides the personal cost of the vitriolic personal attacks, finding results that map out a hideous future for humankind–including our own children and grandchildren–can be devastating. In a forceful opinion piece a few months ago (The terrible truth of climate change), Gergis wrote:

“Increasingly after my speaking events, I catch myself unexpectedly weeping in my hotel room or on flights home. Every now and then, the reality of what the science is saying manages to thaw the emotionally frozen part of myself I need to maintain to do my job. In those moments, what surfaces is pure grief.”

And so…

All the above may make our Open Science efforts pale. But, as we try to figure out how to improve the trustworthiness of our own research, I think it’s worth pondering the strategies this other discipline has adopted in its own effort to give us results that we simply must believe.


P.S. A warm thank you to those friends and colleagues near and far who have sent enquiries, and messages of concern and support. Very much appreciated. Yes, we are all in this together.

Banishing “Black/White Thinking”

eNeuro publishes some teaching guidance

You may recall that eNeuro published a great editorial and a supporting paper by Bob and me–mainly Bob. Info is here.

It has now published a lovely article giving teaching advice about ways to undermine students’ natural tendency to think dichotomously. If I could wave a magic wand and change a single thing about researchers’ thinking and approach to data analysis, I’d ask the magic fairy to replace Yes/No research questions with ‘How much…?’ and ‘To what extent…?’ questions. Then maybe we could at last move beyond blinkered significant/nonsignificant, yes/no thinking to estimation thinking. The article (pic below) is here.

Literally hundreds of statisticians have rightly called for an end to statistical significance testing (Amrhein et al., 2019; Wasserstein et al., 2019). But the practice of arbitrarily thresholding p values is not only deeply embedded in statistical practice, it is also congenial to the human mind. It is thus not sufficient to tell our students, “Don’t do this.” We must vividly show them why the practice is wrong and its effects detrimental to scientific progress. I offer three teaching examples I have found to be useful in prompting students to think more deeply about the problem and to begin to interpret the results of statistical procedures as measures of how evidence should change our beliefs, and not as bright lines separating truth from falsehood.

In the abstract (above), I love ‘congenial to the human mind‘. Yes, we seem to have an inbuilt tendency to think in a black-white way. Overcoming this is the challenge, especially when it has been endlessly reinforced during more than half a century of obeisance to p < .05. I also love the ‘vividly‘–surely the best way to get our message across. That’s why I keep banging on about the dance of the p values, and significance roulette. (Search for these at YouTube.)

Scroll down for the interesting bits

Before you click on the pdf link, scroll right down to see the reviewing history of the ms. Bob was the reviewer. The story is a nice example of constructive peer reviewing. Bob and the editor liked the original and made a number of suggestions for strengthening it. The author adopted many of these, but in some cases explained why they were not adopted.

Note also that there are some PowerPoint slides for download, to help the busy teacher.



Farewell and Thanks Steve Lindsay

Psychological Science, the journal, has for years pushed hard for publication of better, more trustworthy research. First there was the leadership of Eric Eich, then Steve Lindsay energetically took the baton. Steve is about to finish, no doubt to his great relief. His ‘swan song’ editorial has just come online, with open access:

Steve starts with a generous reference to a talk of mine at Victoria University in Wellington N.Z. No doubt I talked mainly about the huge variability of p values, and ran the ‘dance of the p values‘ demo. (Search for ‘dance of the p values’ at YouTube to find maybe 3 versions.)

Then he gives a brief and modest account of the great strides the journal took under Eric’s and then his own leadership. Indeed, Psychological Science has been vitally important in advancing our discipline! I’m sure, also, that it has had beneficial influence well beyond psychology.

All eyes are now on the incoming editor, Patricia Bauer. To what extent will she keep up the Open Science pressure, the further development of journal policies and practices to keep our field moving towards ever more reproducible and open–and therefore trustworthy and valuable–research?

I join Steve in wholeheartedly wishing her well.


NeuRA Ahead of the Open Science Curve

I had great fun yesterday visiting NeuRA (Neuroscience Research Australia), a large research institute in Sydney. I was hosted by Simon Gandevia, Deputy Director, who has been a long-time proponent of Open Science and The New Statistics.

Neura’s Research Quality page describes the quality goals they have adopted, at the initiative of Simon and the Reproducibility & Quality Sub-Committee, which he leads. Not only goals, but strategies to bring their research colleagues on board–and to improve the reproducibility of NeuRA’s output. My day started with a discussion with this group. They described a whole range of projects they are working on to strengthen research at NeuRA–and to assess how quality is (they hope!) increasing.

For example, Martin Heroux described the Quality Output Checklist and Content Assessment (QuOCCA) tool that they have developed, and are now applying to recent past research publications from NeuRA. In coming years they plan to assess future publications similarly–so they can document the rapid improvement!

I should mention that Martin and Joanna Diong run a wonderful blog, titled Scientifically Sound–Reproducible Research in the Digital Age.

It was clear that the R&Q folks, at least, were very familiar with Open Science issues. Would my talk be of sufficient interest for them? Its title was Improving the trustworthiness of neuroscience research (it should have been given by Bob!), and the slides are here. The quality of the questions and discussion reassured me that at least many of the folks in the audience were (a) on board, but also (b) very interested in the challenges of Open Science.

After lunch my ‘workshop’ was actually a lively roundtable discussion, in which I sometimes managed to explain a bit more about Significance Roulette (videos are here and here), demo a bit of Bob’s new esci in R, or join in brainstorming strategies for researchers determined to do better. My slides are here.

Yes, great fun for me, and NeuRA impresses as working hard to achieve reproducible research. Exactly what the world needs.


I Join an RCT: A View From the Other Side

In ITNS we discuss randomized control trials (RCTs) and I’ve taught about them since whenever. If done well, they should provide gold standard evidence about the benefits and harms of a therapy. So I was particularly interested to be invited to join a large RCT. My wife, Lindy, and I both elected to join and we are now into the daily ritual of taking a little white tablet, each of us not knowing whether we have the dud or the active version. Weird!

The RCT is StaREE, A Clinical Trial of STAtin Therapy for Reducing Events in the Elderly. Yep, I’m officially ‘elderly’ and have been for a couple of years! It’s an enormous multimillion dollar project aiming for 10,000 participants over something like 8 years. It’s publicly funded, no drug company money involved.

StaREE project description and justification (taken from the registration site)

Statin therapy has been shown to reduce the risk of vascular events in younger individuals with manifest atherosclerotic disease or at high risk of vascular events. However, data derived from meta-analyses of existing trials suggests that the efficacy of statins may decline sharply amongst those over 70-75 years of age. Insufficient patients of this age group have been included in major trials to be certain of the benefit. Within this age group part of the benefit of statin therapy may be offset by adverse effects including myopathy, development of diabetes, cancer and cognitive impairment, all of which are more prevalent in the elderly in any event.

The use of statins in the over 70 age group raises fundamental questions about the purpose of preventive drug therapy in this age group. When a preventive agent is used in the context of competing mortality, polypharmacy and a higher incidence of adverse effects its use should be justified by an improvement in quality of life or some other composite measure that demonstrates that the benefit outweighs other factors.

STAREE will determine whether taking daily statin therapy (40 mg atorvastatin) will extend the length of a disability-free life, determined from survival outside permanent residential care, in healthy participants aged 70 years and above.

Background Reading

There’s a big 2016 review in The Lancet on open access here. A more recent review and meta-analysis is here. These seem to me to support the need for StaREE. Atorvastatin, the cholesterol-lowering drug under study seems safe and effective, while being cheap and widely-known. But most research to date has focussed on folks who have already had some cardiac event, or have high risk factors. There is need for evidence specifically about its possible value for healthy older folks.

My Experience So Far

Of course I first read all that the StaREE website had to offer, and what’s public in the registration of the trial at ClinicalTrials.gov and (more or less the same information) the Australian New Zealand Clinical Trials Registry.

I have asked the researchers for any further information they can give me, including information on:

  • how the sample size of 10,000 was chosen
  • the extent to which the full data and analysis scripts will be open, i.e. on public access
  • further details of the data analysis planned, beyond the long list of measures included in the registration (these first 3 dot points are about Open Science good practice)
  • details of how the safety committee is to operate and, in particular, what criteria will it use if it decides the trial should be stopped. (Such a committee is independent of the researchers and sees progressive results, without blinding, so can monitor any emerging trends that the therapy is clearly way better, or worse, than placebo. What evidence would lead it to stop the trial?)
  • budget

No reply yet–I may report further if I find out more.

At my first appointment a nurse trained in the StaREE protocols explained everything in very simple terms. I signed, including agreement that the researchers could have full access to my medical records, past present and future. I did some cognitive tests, mainly of memory. (One of the aims is to assess the extent dementia risk might be reduced by the medication.) Then I started the one-month lead-in period, during which I had to take a tablet every day. This was stated to be placebo, despite one of the reviews (links above) arguing that it’s more informative to use the active tablets for all participants during the lead-in. There was a long questionnaire about my medical history, and a blood test, including cholesterol, was ordered.

Then I needed to see my regular doctor, who shared with me the blood test results. My LDL (‘bad’ cholesterol) and HDL (‘good’ cholesterol) levels were well within the normal range, as I expected. She signed to say that none of the StaREE exclusion criteria applied to me and that I could join the trial.

It was made clear that my own health care is first priority, so my doctor can, if she judges it necessary, be unblinded and perhaps advise me to leave the trial. Of course it should work that way, but that’s just one more complication for the researchers.

At my second appointment there were more cognitive tests. Quite amusing, given that I was familiar with some, having taught about them way back. Even so, it’s not so easy to remember the long list of words at first go, and to minimise Stroop interference while quickly reading out the ink colour of incongruent colour words. (Say ‘RED’ on seeing BLUE in red ink.)

Then the big pack of my tablets arrived in the post and I confirmed online that I was starting. One more thing to remember when packing for any travels, even overnight.

Blinding–but perhaps not of participants

According to the registration, the blinding (actually referred to as ‘masking’) is: “Quadruple (Participant, Care Provider, Investigator, Outcomes Assessor)”. Excellent. At present I’m blinded, but at any stage my doctor might order a blood test, including for cholesterol. A distinct drop in my LDL (and perhaps a boost to my HDL) would strongly suggest that I’m on the statin medication. I would, in effect, be unblinded.

I’ve also read about, and had anecdotal reports of small changes that can sometimes be felt when starting statins–call them minor, short-lived side-effects. These might also unblind a participant, although of course anyone aware of such possibilities might judge them to occur when actually starting placebo!

When I return for my next StaREE appointment after 12 months, there will be more cognitive tests and the researchers will order a blood test, but that will specifically not include cholesterol–so the researchers will remain blinded. However, many older folks have blood tests, including for cholesterol, regularly, even annually. And the documentation about Atorvastatin provided by StaREE states that “your cholesterol… levels need to be checked regularly…”. Therefore many participants could potentially know their cholesterol levels, say a year or so into the trial, and therefore potentially be unblinded. I don’t see any way around this. Just one more complication for researchers, and perhaps for the interpretation of results.

Efficacy or Effectiveness?

Efficacy… under ideal circumstances, e.g. in an ideal RCT. Effectiveness… in the real world. See here for A Primer on Effectiveness and Efficacy Trials.

RCTs are easy to criticise for imposing so many restrictions on who can participate–in the interests of minimising nuisance variability–that estimates of treatment benefits may be overestimated compared with what’s realistic to expect in everyday clinical practice, no doubt with a more diverse set of patients. StaREE does have a list of exclusion criteria (see the 10 dot points at the registration site) but this case is a little different: The research question asks about possible treatment benefits for a wide range of generally healthy folks; the exclusion criteria are actually not very restrictive. Maybe the efficacy estimated by StaREE won’t be all that different from the effectiveness to be expected if statin use becomes widespread among the healthy elderly. (That word again!)

Compliance is often notoriously low, especially I would suspect when a drug is to be taken for ever, and for a not very dramatic–even though worthwhile–reduction in risk of some nasty outcome. In StaREE, volunteers who choose to take part in research may be quite compliant–although perhaps the knowledge that there is only a 50-50 chance that the tablets contain the active ingredient might reduce compliance. Conversely, if StaREE does find evidence that statins are worth taking by healthy older folks, then folks won’t have the special context of a research project, but they will be sure that they are getting the good stuff. I’ll be interested to know StaREE compliance, but I’m unsure how well that will predict real-life compliance.

How Do I Feel?

It’s great that such a study has been funded. My experience so far emphasises what an enormous task it is to plan, set up, and run such a usefully large study. Far more difficult and complex than to write a couple of pages in a textbook about how an RCT should be designed! (A year ago I wrote a post about two enormous and expensive RCTs–the SYNTAX and EXEL studies–that compared stents and coronary grafts as treatments for heart disease. The two very different approaches turn out, overall, to be about equally good.)

I’m keen to know how closely this StaREE study aligns with Open Science practices.

Of course I’ll be 100% compliant! I’ll take my little white tablets every single day, even if it’s a coin toss whether they are all duds or not. Yes, for sure! Well, a couple of years on, will I still feel this way? I hope so. But consider: 10,000 participants, taking tablets daily for an average of 5 years or so, is approx 16 million tablet-taking moments. And all those moments are needed to find out some simple information: by how much is the risk reduced for stroke, heart attack, etc by taking the little white tablets that do contain (what we hope is) the good stuff?

Most of us probably owe our lives to modern medicine, partly thanks to participants who have consented to join past studies. I’m glad to have the chance to join what looks like a well-done and worthwhile study now.



Replications: How Should We Analyze the Results?

Does This Effect Replicate?

It seems almost irresistible to think in terms of such a dichotomous question! We seem to crave an ‘it-did’ or ‘it-didn’t’ answer! However, rarely if ever is a bald yes-no decision the most informative way to think about replication.

One of the first large studies in psychology to grapple with the analysis of replications was the classic RP:P (Replication Project: Psychology) reported by Nosek and many colleagues in Open Science Collaboration (2015). The project identified 100 published studies in social and cognitive psychology then conducted a high-powered preregistered replication of each, trying hard to make each replication as close as practical to the original.

Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349 (6251) aac4716-1 to aac4716-8.

The authors discussed the analysis challenges they faced. They reported 5 assessments of the 100 replications:

  • in terms of statistical significance (p<.05) of the replication
  • in terms of whether the 95% CI in the replication included the point estimate in the original study
  • comparison of the original and replication effect sizes
  • meta-analytic combination of the original and replication effect sizes
  • subjective assessment by the researchers of each replication study: “Did the effect replicate?”

Several further approaches to analyzing the 100 pairs of studies have since been published.

Even so, the one-liner that appeared in the media and swept the consciousness of psychologists was that RP:P found that fewer than half of the effects replicated: Dichotomous classification of replications rules! For me, the telling overall finding was that the replication effect sizes were, on average, just half the original effect sizes, with large spreads over the 100 effects. This strongly suggests reporting bias, p-hacking or some other selection bias influenced some unknown proportion of the 100 original articles.

OK, how should replications be analyzed? Happily, there has been progress.

Meta-Analytic Approaches to Assessing Replications

Larry Hedges, one of the giants of meta-analysis since the 1980s, and Jacob Schauer recently published a discussion of meta-analytic approaches to analyzing replications:

Hedges, L. V., & Schauer, J. M. (2019). Statistical analyses for studying replication: Meta-analytic perspectives. Psychological Methods, 24, 557-570. http://dx.doi.org/10.1037/met0000189

Formal empirical assessments of replication have recently become more prominent in several areas of science, including psychology. These assessments have used different statistical approaches to determine if a finding has been replicated. The purpose of this article is to provide several alternative conceptual frameworks that lead to different statistical analyses to test hypotheses about replication. All of these analyses are based on statistical methods used in meta-analysis. The differences among the methods described involve whether the burden of proof is placed on replication or nonreplication, whether replication is exact or allows for a small amount of “negligible heterogeneity,” and whether the studies observed are assumed to be fixed (constituting the entire body of relevant evidence) or are a sample from a universe of possibly relevant studies. The statistical power of each of these tests is computed and shown to be low in many cases, raising issues of the interpretability of tests for replication.

The discussion is a bit complex, but here are some issues that struck me:

  • We usually wouldn’t expect the underlying effect to be identical in any two studies. What small difference would we regard as not of practical importance? There are conventions, differing across disciplines, but it’s a matter for informed judgment. In other words, how different could underlying effects be, while still justifying a conclusion of successful replication?
  • Should we choose fixed-effect or random-effects models? Random-effects is usually more realistic, and what ITNS and many other books recommend for routine use. However, H&S use fixed-effect models throughout, to limit the complexity. They report that, for modest amounts of heterogeneity their results do not differ greatly from what random-effects would give.
  • Meta-analysis and effect size estimation are the focus throughout, but even so the main aim is to carry out a hypothesis test. The researcher needs to choose whether to place the burden of proof on nonreplication or replication. In other words, is the null hypothesis that the effect replicates, or that it doesn’t?
  • One main conclusion from the H&S discussion and the application of their methods to psychology examples is that replication projects typically need many studies (often 40+) to achieve adequate power for those hypothesis tests, and that even large psychology examples are under-powered.

A further sign that the issues are complex is the comment published immediately following H&S with suggestions for an alternative way to think about heterogeneity and replication:

Mathur, M. B., & VanderWeele, T. J. (2019). Challenges and suggestions for defining replication “success” when effects may be heterogeneous: Comment on Hedges and Schauer (2019). Psychological Methods, 24, 571-575. http://dx.doi.org/10.1037/met0000223

H&S gave a brief reply:

Hedges, L. V., & Schauer, J. M. (2019). Consistency of effects is important in replication: Rejoinder to Mathur and VanderWeele (2019). Psychological Methods, 24, 576-577. http://dx.doi.org/10.1037/met0000237

Our Simpler Approach

I welcome the above three articles and would study them in detail before setting out to design or analyze a large replication project.

In the meantime I’m happy to stick with the simpler estimation and meta-analysis approach of ITNS.

Meta-Analysis to Increase Precision

Given two or more studies that you judge to be sufficiently comparable, in particular by addressing more-or-less the same research question, then use random-effects meta-analysis to combine the estimates given by the studies. Almost certainly, you’ll find a more precise estimate of the effect most relevant for answering your research question.

Estimating a Difference

If you have an original study and a set of replication studies, you could consider (1) meta-analysis to combine evidence from the replication studies, then (2) finding the difference (with CI of course) between the point estimate found by the original study and that given by the meta-analysis. Interpret that difference and CI as you assess the extent to which the replication studies may or may not agree with the original study.

If the original study was possibly subject to publication or other bias, and the replication studies were all preregistered and conducted in accord with Open Science principles, then a substantial difference would provide evidence for such biases in the original study–although other causes couldn’t be ruled out.

Moderation Analysis

Following meta-analysis, consider moderation analysis, especially if DR, the diamond ratio, is more than around 1.3 and if you can identify a likely moderating variable. Below is our example from ITNS in which we assess 6 original studies (in red) and Bob’s two preregistered replications (blue). Lab (red vs. blue) is a possible dichotomous moderator. The difference and its CI suggest publication or other bias may have influenced the red results, although other moderators (perhaps that red studies were conducted in Germany, blue in the U.S.) may help account for the red-blue difference.

Figure 9.8. A subsets analysis of 6 Damisch and 2 Calin studies. The difference between the two subset means is shown on the difference axis at the bottom, with its CI. From d subsets.

My overall conclusions are (1) it’s great to see that meta-analytic techniques continue to develop, and (2) our new-statistics approach in ITNS continues to look attractive.


‘Preregistration’ or ‘Registration’?

For years, medicine has urged the ‘registration‘ of clinical trials before data collection starts. More recently, psychology has come to use the term ‘preregistration‘ for this vital component of Open Science. The ‘pre’ puts it in your face that it happens at the start, but should we fall into line and use the long-established term ‘registration’ for consistency and to avoid possible confusion between disciplines? There is already a move in that direction in that some psychology journals are accepting Registered Reports, rather than Pre-.

This recent article makes a spirited argument for consistency:

Rice, D. B., & Moher, D. (2019). Curtailing the use of Preregistration: A misused term. Perspectives on Psychological Science, 14, 1105-1108. doi: 10.1177/1745691619858427

Here’s the abstract:

Improving the usability of psychological research has been encouraged through practices such as prospectively registering research plans. Registering research aligns with the open-science movement, as the registration of research protocols in publicly accessible domains can result in reduced research waste and increased study transparency. In medicine and psychology, two different terms, registration and preregistration, have been used to refer to study registration, but applying inconsistent terminology to represent one concept can complicate both educational outreach and epidemiological investigation. Consistently using one term across disciplines to refer to the concept of study registration may improve the understanding and uptake of this practice, thereby supporting the movement toward improving the reliability and reproducibility of research through study registration. We recommend encouraging use of the original term, registration, given its widespread and long-standing use, including in national registries.

Which should we use in ITNS2?

There was a bit of discussion about the issue at the AIMOS Conference, with a range of views presented.

In ITNS we use ‘preregistration’ all through–I’ve been happy with that because the ‘pre-‘ avoids any ambiguity. But what should we use in ITNS2, the second edition we’re just starting to prepare?

No doubt we’ll explain both terms, but my current inclination is to switch and use registration all through. Tomorrow I might think differently. Psychology’s preference may become clearer in coming months.

What do you think? Please comment below or send an email. Thanks!



Congratulations Professor Fiona Fidler!

Just as the fabulous AIMOS Conference — one of Fiona’s most recent triumphs — was wrapping, it was announced officially that Fiona Fidler has been appointed as full PROFESSOR at the University of Melbourne. Wonderful news!

Wow, when Simine Vazire arrives at the University of Melbourne next year, also as professor, the world of non-open science had better watch out!

Hearty congratulations to Fiona!



AIMOS — The New Interdisciplinary Meta-Research and Open Science Association

Association for Interdisciplinary Meta-Research & Open Science (AIMOS)

I had a fascinating two days down at the University of Melbourne last week for the first AIMOS conference. The program is here and you can click through to see details of the sessions.

Congratulations to Fiona Fidler and her team for pulling off such a terrific event! At least 250 folks attended, and huge ranges of disciplines and talk topics were included.

The Association was formally launched at a meeting with real buzz. The organisers were taken aback (in a good way) to have so many nominations for some office-holder and committee positions that elections were needed. The incoming President is Hannah Fraser (see here and scroll down).

See more about AIMOS and the launch here.

We were told to pay attention to the title of the Association. The ‘A’ does NOT stand for Australia! The ‘I’ stands for interdisciplinary, and we really mean that! Also, meta-research and Open Science are not the same! Phew. But all those points were amply exemplified by the fabulous diversity of speakers and topics. Philosophy to ecology, politics to medicine, economics to statistics, and tons more besides.

A Few Highlights

Haphazardly chosen:

  • Simine Vazire gave a rousing opening keynote, asking whether we want to be credible or incredible. (Breaking news: Simine is joining the University of Melbourne from July next year. Wonderful!)
  • Federal politician Andrew Leigh, author of Randomistas–a great book about using RCTs to develop and guide public policy and about which I blogged last year–spoke about evidence-based policy in the public interest, and how research can shape that. Best one-liner, reflecting on replication: “If at first you do succeed, try, try and try again.” It’s a terrible shame that this year’s election didn’t see him and his colleagues running the country.
  • James Heathers loves naming things. That’s just part of his enthusiastic and highly effective way of communicating. He develops ways to identify errors in published articles, and gives his methods names including GRIM, SPRITE, DEBIT, and RIVETS.
  • Franca Agnoli, from Padua, reported that Bob’s talk (see that link for links to lots of new-statistics goodies) a month or so ago in Cesena, Italy, was terrific.

Estimation: Why and How, now with R

That was the title of my 90 minute workshop. About 22 folks participated, and my slides, which are here, have been accessed by 32 uniques. I loved demonstrating Bob’s part-prototype R module, esci.jmo, which you can download here. It can be side-loaded into jamovi. The full version of esci.jmo will be the key upgrade of ITNS to give the second edition. That’s our task for 2020!

Please be warmly encouraged to sign up to join AIMOS, which is intended to be a global association. Next year’s conference will be in Sydney. You can join a mailing list here–ignore the outdated title, I’m sure the AIMOS sites will be updated shortly.


Good Science Requires Unceasing Explanation and Advocacy

Recently in Australia a proposal was made for an “independent science quality assurance agency”. Justification for the proposal made specific reference to “the replication crisis” in science.

Surely we can all support a call for quality assurance in science? Not so fast! First, some context.

Australia’s Great Barrier Reef, one of the wonders of the natural world, is under extreme threat. Warming oceans, increasing acidity, sea level rise, and more, comprise grave threats. Indeed the GBR has suffered two devastating coral bleaching events in recent years, with maybe half the Reef severely damaged.

A recent analysis identified 45 threats to the Reef. Very high on the list is coastal runoff containing high levels of nutrients (primarily from farming) and sediment.

The Queensland State Government is introducing new laws to curb such dangerous runoff.

Now, back to the proposal. It was made by a range of conservative politicians and groups who are unhappy with the new laws. They claim that the laws are based on flawed science–results that haven’t been sufficiently validated and could therefore be wrong.

Fiona Fidler and colleagues wrote a recent article in The Conversation to take up the story and make the argument why the proposed agency is not the way to improve science, and that the proposal is best seen as a political move to discredit science and try to reduce what little action is being taken to protect the Reef. Their title summarises their message: Real problem, wrong solution: why the Nationals shouldn’t politicise the science replication crisis. (The Nationals are a conservative party, which is part of the coalition federal government. This government includes many climate deniers and continues to support development of vast coal and gas projects.)

Fiona and colleagues reiterate the case for a properly-constituted national independent office of research integrity, but that’s a quite different animal. You can hear Fiona being interviewed on a North Queensland radio station here. (Starting at about the 1.05 mark.)

Yes, unending explanation and advocacy, as Fiona and colleagues are doing, is essential if good Open Science practices are to flourish and achieve widespread understanding and support. And if sound evidence-based policy is to be supported.

The proposal by the Nationals is an example of agnotology–the deliberate promotion of ignorance and doubt. The tobacco industry may have written the playbook for agnotology, but climate deniers are now using and extending that playbook, with devastating risk to our children’s and grandchildren’s prospects for a decent life. Shame.

A salute to Fiona and colleagues, and to everyone else who is keeping up the good work, explaining, advocating, and adopting excellent science practices.