13 December 2021

The dog ate the protocol

Why are the experiments of oncologists reproduced no better than those of psychologists

Polina Loseva with the participation of Ilya Ferapontov and Ivan Shunin, N+1

In 2015, "Collaboration for Open Science" (Center for Open Science, COS) she reported that she was able to reproduce only 39 results from 100 psychological articles. Now their report on the state of affairs in oncobiology has been released, and it contains almost the same figures: it turned out to reproduce only 46 percent of the results. Does this mean that such seemingly different psychology and biology have common difficulties? Or is each of these areas unhappy in its own way? Or maybe these are problems of science in general?

In 2011, 270 scientists gathered to repeat 100 experiments from articles published in three leading psychological journals. It wasn't that these articles aroused their suspicion. Rather, on the contrary, the participants of the Reproducibility Project: Psychology project undertook to check the most ordinary articles, to which no one had any complaints. In more than half of the cases, the effects reported by the original articles were not confirmed in repeated experiments.

Reproducibility.jpg

Results of Reproducibility Project: Psychology. On the X—axis is the size of the effect found in the original article, on the Y—axis is the size of the effect obtained during reproduction. All points below the diagonal line correspond to effects that could not be reproduced in full. Blue dots represent reliably obtained effects during playback (p<0.05), red dots represent statistically insignificant effects (p>0.05). Open Science Collaboration / Science, 2015.

Brian Nosek and like-minded people did not blame the authors of the articles. And they did not call on the journals to withdraw these articles. "How many of the effects we tested are correct? Zero," the scientists wrote in their report. — How many of the effects we have tested are false? Zero." The problems they stated in this way do not indicate that the results obtained are erroneous — but only that they are difficult to verify in practice.

Leaving the community of psychologists to reflect on how it happened, Nosek switched to the next project. Together with colleagues, he selected 53 most popular articles on oncobiology, which were published from 2010 to 2012. But I ran into unforeseen difficulties: in the end, after spending six years and more than a million dollars, I included only 23 in my report Reproducibility Project: Cancer Biology.

Follow in the footsteps

In the previous project, too, of course, not all the planned experiments took place. 47 out of 158 articles on cognitive and social psychology remained unverified due to the fact that it required special equipment that the participants of the collaboration did not have at hand, special training of employees or rare objects (people with specific psychiatric diagnoses and monkeys). But in the case of oncobiology, there were much more obstacles — so much so that there was enough material for a separate publication (Errington et al., Reproducibility in Cancer Biology: Challenges for assessing replicability in preclinical cancer biology, eLife).

Out of 193 experiments, Nosek and colleagues complain in it, it turned out to be possible to check the statistical processing of the results only in four cases — for all the others, they simply did not have enough data given in the original work. The researchers requested this data from the authors, but in 68 percent of cases remained unanswered.

Moreover, none of the 193 experiments were described in sufficient detail to be reproduced using the article as an instruction. Therefore, all the teams that undertook to repeat them had to consult with the authors of the articles — and about a third of the authors refused to help them.

After the rest of the experiments were launched, it turned out that in most cases the study protocols needed to be changed — the cells and mice did not behave as described in the original article. As a result, out of 193 planned experiments (there are several results in one article that require verification at once), only 50 were able to be carried out.

"This is a problem of traditions," said oncoepidemiologist Anton Barchuk, a researcher at the Petrov Oncology Research Center and Tampere University (Finland), — often it is not the authors of the article who deliberately hide [the data], the format of the article does not allow them to [do this]. For example, it is customary for authors to speculate [about their results], and the discussion eats up a lot of space — including from "Methods", which, in my opinion, is a much more important section".

Of those 50 experiments that still managed to be completed, not all brought the expected results. Of the 158 effects discussed in the original articles, it was possible to assess the reliability only for 112. And they were reliably (that is, statistically significantly) reproduced only in 46 percent of cases.

Reproducibility2.jpg

Results of Reproducibility Project: Cancer Biology. On the X—axis is the size of the effect found in the original article, on the Y—axis is the size of the effect obtained during reproduction. All points below the diagonal line correspond to effects that could not be reproduced in full. Blue dots represent reliably obtained effects during playback (p<0.05), red dots represent statistically insignificant effects (p>0.05). A drawing from an article by Errington et al.

Humans and Mice

"Show this link to anyone," advises psychologist Jay Van Bavel on Twitter, "who will say that psychological science is less reliable than "real science" like biology." Nosek quotes a colleague's tweet with the comment: it is incorrect to compare two projects directly. Articles on psychology were taken from three reputable specialized journals, and oncobiological ones were selected by altmetry and citation on the Web of Science and Scopus platforms. In addition, psychological articles usually describe one effect, and biological articles describe several at once. Therefore, the data obtained does not mean that 54 percent of articles on oncobiology are not reproduced in their entirety — perhaps only some part of them is not reproduced.

Nevertheless, psychology and oncology certainly have something in common. In correspondence with N + 1 Nosek suggests that this is "a reward system for new, positive, surprising results at the expense of rigor, transparency and detailed descriptions." That is, the true culprit is publication bias, the tendency of journals to accept articles for publication, the results of which look interesting. Therefore, the presence of correlation is valued higher than its absence, and the extraction of new information gets priority over the verification of the old one.

"This is treated with the help of registered reports," explains Ilya Yasny, head of scientific expertise of the pharmaceutical fund Inbio Ventures. This is an alternative scheme of publication in a scientific journal: the authors do not send to the editorial office a ready-made report on the work done, but an application for a specific experiment to test a certain hypothesis. And if the experts in the editorial office approve it, then the journal undertakes to publish the result (if it can be obtained, of course) — regardless of whether it is positive or negative. Thus, the scientific work is reviewed twice — before the experiment begins, and after the data is processed. This, on the one hand, makes the work more reliable and reproducible, and on the other, helps to save money — because reviewers will stop unnecessary and meaningless experiments at the start.

The concept of registered reports was invented by the same people from the Center for Open Science. They also conducted their project on oncobiology in this way: first they separately agreed and published protocols, and then they showed the results. And the effect of this practice is already noticeable, Yasny believes: "since 2013, the quality of [articles in the field] has increased as a result, judging by the assessment by eye."

Timothy Errington, head of the Reproducibility Project: Cancer Biology, also warns in correspondence with N+1 that it is not worth comparing psychology and oncobiology directly, despite the similar results of the two COS projects. The difference between the disciplines that caught his eye, he says, is rather methodological: "analytical flexibility" versus "experimental flexibility". When researchers reproduce works on psychology, problems arise at the last stage: it turns out to be difficult to get the same conclusions based on the same data. In the work on cancer biology, obstacles begin long before statistical processing — it is difficult even to pick up all the necessary objects and reagents. And this is not a unique property of oncobiology, Yasny believes. According to him, "the situation with animal models everywhere [in biology] is bad."

They have to live with it

One way or another, oncobiology seems to receive two blows at once: both on the reproducibility of its experiments and on the reproducibility of its results. Considering that Nozek and colleagues managed to deliver less than half of the necessary experiments and get less than half of the expected effects, the final reproducibility turns out to be no more than a quarter.

But among the experts with whom N + 1 spoke, this does not particularly depress anyone. After all, replicating experiments is an integral part of the scientific process. "What the authors of the original articles published," says Yasny, "is an innovation. Each innovation opens up opportunities, and [its] verification indicates how feasible these opportunities are. The fact that some scientists are doing something and others don't trust them is science."

And the scale of non-reproducibility is not new. Back in 2011, Bayer employees tried — though without revealing details — to repeat several dozen experiments, including from oncology, and they did not agree on more than 25 percent of the results. Those who are engaged in clinical trials know this firsthand: in 19 out of 20 cases, according to Yasny, antitumor drugs fail at this stage — "because too many drugs pass filters on not too well-done preclinical studies."

Therefore, those who work directly with the results of such articles have their own recipes and ways to determine what to believe and what not. "We all get used to seeing," says Yasny, "where the problems are and how well the research has been done. In addition, what is being developed in the depths of Big Pharma is better arranged from a methodological point of view [than in academic science]. If they put insufficiently researched drugs into clinical trials, they will shoot themselves in the foot, because they will lose time and money. In small startups, it's a little more difficult, financing and investor interest depend on their results, but they can't openly lie either."

"There is only one recipe," says Barchuk, "to be critical of the results obtained in one or two studies. It's always worth waiting for someone to repeat them. Perhaps this new study of theirs will also be difficult to repeat."

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version