01 April 2009

"The Grubber", take two

From the editorial office:
The recent story with the "Grubber" (the so-called article generated by a random text writing program and published in a magazine from the list of the Higher Attestation Commission) was continued. The details are set out in today's issue of the newspaper "Troitsky variant". Since the electronic version of the newspaper is published in a pdf format that is not too easy to read (and its number "weighs" 4.5 MB), with the permission of the editorial board, we publish the text of this article, including a brief description of the background and an interview with one of the developers of the domestic bredogenerator.


Another magazine from the "list of the Higher Attestation Commission", "Bulletin of Tomsk State University (Philosophy, sociology, political science)", published a computer-generated article. However, unlike the case with the "Journal of Scientific Publications of Graduate Students and Doctoral Students", which was excluded from the list of the Higher Attestation Commission, this had no administrative consequences for the journal.

In September 2008, the editorial board of the "Trinity Variant" did an experiment. In the Journal of Scientific Publications of Postgraduates and Doctoral Students, a hoax article was sent "The Grubber: an algorithm for typical unification of access points and redundancy", generated by a computer. We translated the text generated (in English) by the SCIgen program (which was created by a group of students at the Massachusetts Institute of Technology) using another program, STAGE-3 (developed in the Laboratory of Computational Linguistics of the IPPR RAS). The purpose of TrV was to draw the attention of the community to the existence of journals that pretend to be scientific and are even included in the list of the Higher Attestation Commission, but in fact should not be considered scientific. The details of this story were described in TrV No. 13. The experiment turned out to be more successful than could have been expected. In addition to the fact that the word "Grubber" itself became a household name in the scientific community, the scandal led to quite tangible administrative measures: the journal was immediately excluded from the VAKOV list (for a discussion of the results and consequences, see TrV No. 15, p. 3 and TrV No. 16, p. 7).

Now it has become clear that there were other consequences. Rosobrnadzor together with the Commission of the Russian Academy of Sciences on Combating Pseudoscience created a special working group, which was instructed to conduct similar checks (by sending out computer-generated quasi-scientific texts "for publication") of all scientific journals on the list of the Higher Attestation Commission, which are academic or university publications. At the same time, a new original RHODES program was used to generate texts, developed by a group of graduate students and students of the Moscow Institute of Physics and Technology and the Institute of Information Problems of the Russian Academy of Sciences.

It is possible to state with some relief that the situation was not as catastrophic as it seemed: out of the first fifty verified journals of the VAKOV list, only one succumbed to this insidious trick: "Bulletin of Tomsk State University (Philosophy, sociology, political science)". He published an article "Darwinism" [1] containing a stream of (computer) consciousness of such content:

"1. Satan. It was necessary to start the conversation about Darwin directly with Satan. The road is straighter" (p.89);

"I'll tell you a secret: teeth, their first appearance and transformation into a jaw for me personally is the most impassable moment, a refutation of Darwinism. And the gums themselves. They are made of a different substance than the meat that our body is stuffed with. This substance is hard, almost like a bone, and when a tooth is pulled out, it is loose and blood flows. How did it all fly into the mouth of creatures on the evolutionary path and get fixed there? Evolutionarily. It wasn't, it wasn't, and then it gradually became. Nonsense, nonsense. Baby talk" (p. 104);

"Some invertebrate worm was crawling, and in the billionth generation it has a mutation: inside the cartilage has ossified, the future, I must say, the spine. I'm not talking about the impossible, how he will inherit this crutch inside himself to his son, daughter – absolutely impossible. I'm talking about him − the freak. After all, with this prosthesis inside him, he will not be able to crawl and I'm afraid that he will also mate" (p.105);

"Some "scientists" (soaked cucumbers)…What can I say about these wet futurologists? Darwinists!" (p. 107);

"And I'm afraid to even say the word "brain". Two kilograms weighing an unsympathetic substance. Looks like a pile of shit. But it doesn't smell, it thinks!" (p. 108);

"And why am I so mad? Why am I so excited about Darwinism? I also found a swell. Well, not swell, but byaka, why should we be sad about that?" (p. 111);

"3. Darwinism and the Bible. I wrote this chapter, and there were about two pages in it, and erased it" (p.95).

Etc.

The fate of the "Bulletin of Tomsk University" and its editorial board headed by Prof. S.S.Avanesov hung in the balance. The magazine was saved by legal subtleties. The fact is that the SCIgen program used for the "Grubber" can generate texts only in the field of computer science. Therefore, the working group used another program, RHODES, to create texts in various fields, which generates nonsense by independently combining small fragments of other people's source texts.

In particular, for texts on biology and evolution, the computer shuffled finely chopped content from creationist sites and forums on the Russian-language Internet (antidarvin.ru As a result, a number of members of the working group had concerns: were the source texts finely chopped enough so that the final product could be considered obviously nonsense? The Working Group could not find a legally precise wording.

This would not have become a serious obstacle to taking administrative measures if not for the recent appeal of the editor-in-chief of the Journal of Scientific Publications of Graduate Students and Doctoral Students V.V.Ivanov to the arbitration court with a lawsuit against the Ministry of Education and Science (see TrV No. 4/23, p. 12). V.V.Ivanov disputed the exclusion of his journal from the list of the Higher Attestation Commission, arguing that this restricts free competition and violates antimonopoly legislation. Although the court ruled in favor of the defendant, the Ministry of Education and Science, the Higher Attestation Commission decided not to contact the Tomsk journal, especially since a general revision of the entire list is planned (see TrV No. 17, p. 5).

It is possible and necessary to discuss the question of how correct this method of checking logs by administrative authorities is. It looks too much like a provocation, and what is allowed to private individuals no longer looks so good in the hands of the governing bodies. But in any case, I am glad that our students were able to create a program that is not inferior in efficiency to the famous SCIgen.

1. V.B.Rhodes. Darwinism. Bulletin of Tomsk State University (Philosophy. Sociology. Political Science). No. 1(2), pp. 89-119 (2008)

Kirill Bocharov


Interview
with a member of the working group on quality control of journals, developer of the RHODES program, Candidate of Technical Sciences Mikhail Kovalev

– Are you satisfied with the results of the experiment?

– Yes and no. As a Russian scientist, I am glad that the level of most of our scientific journals was not as terrible as everyone thought after the story with the "Grubber". And as the author of the program, of course, I would like the texts generated by it to be more similar to the real ones. I must say that based on the results of the work, some changes will be made to the next version of the program.

– Why? Does anyone plan to continue testing magazines for durability?

– Yes, as far as I know, this work continues. There is also a purely scientific interest – no one canceled the Turing test. And then, there are not only journals in the world – there are hundred-page reports that scientific institutes are forced to write in a multitude, there are patents, conference abstracts, in the end, there are student essays…

– It seems that students manage simply by downloading ready-made abstracts from the Internet.

– Yes, but such abstracts are easy to catch using Google or various "anti-plagiarism" programs. And RHODES makes the text unrecognizable.

– RHODES – how does this name stand for?

– Nothing. This is not an abbreviation, but the name of a Greek island. When the story of the "Grubber" happened, we said that it was not very difficult to write a program that generates human-like texts, and colleagues remembered Aesop's fable "Braggart". It tells how a man who boasted of extraordinary jumps on this island was told: "Here is Rhodes, jump here!" (Hic Rhodus, hic salta).

– That is, you wrote this program "weakly"?

"Not exactly. Our group has been working in the field of coherent text generation for a long time. But the reason for writing this program was really the "Grubber".

– What is the difference between your program and the one that wrote the "Grubber"?

– The SCIgen program uses context-free grammar. This is a well-known algorithmic technique. In principle, since the source code is available, it would be possible to retrain the program on new material. But it would take too much effort. So we went the other way. We used two well-known algorithms that were developed for other purposes, but, as it turned out, they are suitable in our case, and, most importantly, do not need to be retrained for each new area.

One algorithm is used in the well-known bibliographic biomedical database PubMed, which has the concept of "related articles" (relatedarticles). The algorithm analyzes the summaries of articles and groups them by similarity of content. In our program, this algorithm forms the basis for the article – a sequence of fragments that say the same thing.

The second algorithm is called the "Markov morphological analyzer". This is a linguistic technique that, in our case, edits consecutive fragments so that there are no contradictions in grammar – sentences should be consistent in time, number, etc. Simply put, the text should be "smooth".

– Did you manage to achieve this?

– I think so. See for yourself – the text of the article is available on the Internet.

– What texts did you take as the source?

– Basically, various pseudoscientific Internet sites and forums were used. In addition, "to revive the syllable", "A Letter to a learned neighbor" by A.P. Chekhov and some stories by Mikhail Zoshchenko were added.

– Have you seen the reviews of the articles written by your program?

– Yes, some magazines not only rejected the articles, but also sent reviews. I must say, they were quite similar. As a rule, these were rather short texts, in which there was a lack of novelty, sometimes a bad outline of the article, general incoherence. Many reviewers pointed out factual errors, which is not surprising – materials of rather dubious origin were used as sources of texts for articles.

– Why exactly was your development applied?

– Partly by coincidence. When the director of our Institute (IIP RAS), G.L.Skuratov-Belsky, was included in the academic working group on journal quality research, he asked us to speed up the necessary work on fine-tuning RHODES. On the other hand, I am not aware of other projects of this kind.

– Is your program available via the Internet?

– Not yet. In fact, we're not sure that would be the right thing to do. It is necessary to weigh the pros and cons.


Portal "Eternal youth" www.vechnayamolodost.ru01.04.2009

Found a typo? Select it and press ctrl + enter Print version