22 October 2015

I want to become a bioinformatician. Where to start?

Michael Gelfand's advice


I'm starting a new blog. How difficult it is to always start something new. But I want to! I am sure that many of those who responded to the title of the article thought, wanted or even dreamed of doing such wonderful sciences as bioinformatics and systems biology, but did not dare. Or they started, but it seems – they went the wrong way.

Doctor of Biological Sciences, Professor, Deputy Director of the Institute of Information Transmission Problems of the Russian Academy of Sciences, member of the European Academy, laureate of the A.A. Baev Prize, member of the Public Council of the Ministry of Education and Science, one of the founders of the Dissernet, bioinformatician Mikhail Gelfand came to our aid. He visited Kiev in September 2015 and gave us three fascinating lectures of different levels of complexity, inspired many beginners and dreamers, gave a number of interesting interviews, including "My Science", which is not afraid to start new and ask questions.

If you are not familiar with bioinformatics yet, then here, here and here Mikhail Sergeyevich talks about what kind of bird it is, how it flies and why.

In the same article, Professor Gelfand answers the question "I want to become a bioinformatician. Where to start?" 

Who do you need to be to become a bioinformatician?Bioinformatics, for which I love it, is absolutely elastic in effort.

The main necessary creative property is a strong ass, perseverance, efficiency. Insights like went to bed, and in the morning I dreamed of a periodic law or a benzene formula – this does not happen in bioinformatics. Well, it probably happens, but I haven't seen it.

Bioinformatics was originally a science for losers. Most often, bioinformatics became either failed mathematicians or failed biologists, whose test tubes slipped out of their hands. When I, being a mathematician by education, discovered that I could not prove theorems, I was very lucky, because bioinformatics appeared at that time.

Another example: I have an absolutely wonderful girl in my laboratory who graduated from the Faculty of Economics, worked for three years as an analyst in some company, studied at our bioinformatics school for two years, then took a task on annotation of the bacterial genome and in the process of annotation found a new enzyme, and then a new metabolic pathway. This transformation took three years.

Where to start studying bioinformatics?When I started doing bioinformatics, there was nothing to learn there – I had to take it and do it.

It was such romantic times when there was no science except the one you did yourself. It's not like that now. Now this is already an area that needs to be entered, there is already quite a lot done there.

First, you need to know molecular biology well. Further, there are decent textbooks on bioinformatics, and they are different. There are more biological ones, and there are more algorithmic ones. For example, "Bioinformatics for dummies" – you don't need to learn from it, because it's just a collection of recipes when you need to do something specific. 

Secondly, minimal programming and, thirdly, minimal statistics. Moreover, there are both general statistics, and if, for example, if you study transcriptomics, then there are their own specific methods adapted to the task (there are good reviews about this).

Then you need to roughly select an area and you need to read the latest reviews in this area, just to come up with a problem.

Which comes first – learn to program or find a problem?It happens as you like, because there are tasks that are done without any programming, but simply with existing tools that are available on the Internet.

There are many tasks that require programming, but the most minimal – say, to rewrite the file in a different format, roughly speaking – to submit the results of one program to the input of another. I will reveal a terrible secret that I wrote the last line of code about 15 years ago, which does not prevent me from being a more or less successful bioinformatician.

How to choose an area and come up with a good task?Now there is such a situation that it is difficult to come up with a problem yourself, or rather, it is easy, but it is quite likely that it has already been invented.

Then it will be a shame. In a good way, it's useful to talk to someone.

You can take journals whose titles include "bioinformatics" or "computational biology", "genomics", a dozen of the best in terms of impact factor, and just look at the titles of articles and figure out which of these you would find interesting. Having roughly defined the area, read the reviews and understand what people are doing in this area.

Read a lot and at this time think a lot and figure out what is not finished in the article, what is done badly, where it is just a lie, where something is swept under the carpet. A good exercise is to take an article from Nature on systems biology, read it carefully with all the appendices, just with a red pencil, and mark all the omissions. If you do this carefully several times, there will be a lot of them.

And after reading a lot of good articles and thinking, problems appear quite non-trivial.

Well, for example, in systems biology you have a bunch of data, but very heterogeneous. In principle, a good exercise is to write different data variants on a piece of paper and try to take all possible pairs and think about what problem could be solved with their help. The next exercise is to take three types of data and come up with a task for them. Most likely, if you came up with it, then someone has already come up with it too - but it happens in different ways.

Another good exercise. I hate taking exams, so I have a standard task that I ask for the exam, I ask the topic of scientific work, and then I say: "Imagine if you had a sequencer. What would you do with it for your diploma?"

It is also useful to come up with a task, and then see if there is data for it. With a high probability it may turn out that there already is.

You can, of course, write a letter to people who have good articles in the area you are interested in, but this has a very low response. The right option is to see where these people hang out, try to travel to good conferences and schools. There you can see what you like, take a person by the button and ask for an isolated task that could be given to the side, and then consult minimally – in principle, you can do bioinformatics via Skype. 

Which areas of bioinformatics should a beginner prefer?You can try to take some not too fashionable field of bioinformatics.

Because if you study transcriptome changes in cancer, then there are a million people grazing there, there is clearly nothing to catch. Although you can do very interesting work there, very beautiful things, but to come up with them, you need to spin around there for five years.

But some kind of bacterial evolution, there are fewer people there. It is not so fashionable and you will not publish there immediately in Nature, but it is quite possible to enter the field with this.

In my opinion, a very interesting and underrated topic is the genomics of protozoa. She's so exotic, unfashionable, nice. The genomics of archaea is underestimated, because the dream of this whole "archaean" community is to finally find at least one pathogenic archaea and get at least some money.
It is very good when there are biologists nearby who have fresh data, but you can come up with tasks for publicly available data.

A typical task that I give when a person comes from scratch: there are all sorts of exotic bacterial genomes, the sequence of which was determined, somehow annotated on the machine, and a lot of them are not annotated at all. Just a neat genome annotation is quite an adequate task from the point of view of technology and getting used to the field. Microbiological groups in which there are genomes, but there are no annotations – it's simple, but to consult on some details - well, at least with us.

These are all such slightly side entrances to the area "through the basement". Because if you start treating humanity from cancer right away, it's hopeless. 

How to approach the implementation of your first research in bioinformatics?If you initially have no one to consult, you take a good article that you liked and repeat it on another material.

Let's say someone watched the selection of mammals, and you look at the birds. It doesn't sound like much fun, but when you repeat some good article, on the one hand, you realize that it's not as good as it seemed, and on the other hand, you're mastering the industry. And no one said that evolution in birds is arranged in the same way as in mammals. In the worst case, you will just learn, in the good case, you will see something non–trivial. So it may well be.

The key word in what I said is a good article. And to understand which article is good and which is bad, you need to read a lot of them. Moreover, I believe that any work in bioinformatics done more than five years ago should be redone, because the world is changing quite a lot. When new data comes, you can do more subtle things, or maybe revise something that was done on small data. Here's another good way: take a good article on comparative genomics from five to ten years ago and make it not on ten genomes, but on a thousand. But ten genomes were processed manually, and here it will be necessary to come up with or see what means to use. It's quite possible to come up with something funny here. 

Is it possible to do bioinformatics yourself or do you need a group?According to the statute of the Israeli army, a combat unit is a soldier – not a company, not a platoon.

So the combat unit in bioinformatics is one person. But it is theoretically possible to do bioinformatics alone, but in practice it is harder, because it is very useful to talk. When I started, I was alone for quite a long time and I thought up puzzles for myself and did something myself. It's probably harder now. Again, because our science came out of the romantic age and became a science-a science.

My style in general is that I talk all the time. I work much more efficiently in summer than in winter, because I can walk in the courtyard with my graduate students, smoke a pipe and talk, and in winter it's cold – I have to talk in the room, and this is not so comfortable. At the same time, good ideas often arise during a conversation.

Although there are many excellent examples of bioinformatics in Australia, New Zealand. And in the States, if you look, they don't have any special community, but they communicate mainly at conferences.

That's what is really useful, which is almost impossible for post–Soviet people to accept, is that if you come to a conference, you should not go to museums, but just sit sullenly at all the reports, and during the break you should not eat a pie, huddled in a corner, but walk, get acquainted with everyone and talk. Few people know how to do this, it is psychologically difficult, but nevertheless. Therefore, if there is any money, then you need to choose good conferences, send a poster there, live somewhere in a hostel for the sake of economy and, having arrived there, firstly, listen to everything, and secondly, do not hesitate to hang out. It's their custom. It's normal if you approach Professor N and say, "I'm a graduate student from there and I was very interested in what you were talking about, could you tell me more about that?". Or vice versa, tell what you do and ask about something. Just don't talk for 15 minutes about what you are doing, because he will just run away from you, that is, you need to be able to quickly explain what you are doing and ask a specific question. This is absolutely accepted, this is how conferences are supposed to be organized. 

The "Three whales" of bioinformatics:Good knowledge and understanding of biology
  1. At least minimal programming
  2. Basic and then specialized statistics
  3. Let's summarize and draw up an action plan for a novice bioinformatician:
Read and think a lot!
  1. (you need to read good articles, but always think)Go to schools/conferences and talk (conferences should be good)
  2. Develop thinking (see point 1)
  3. Choose an interesting direction
  4. Look for and come up with good puzzles
  5. Can we get together?
For a fast, powerful start, a team is needed to talk, tell, criticize, hone ideas and skills, and just not let you be lazy.

Therefore, let those who really see bioinformatics, systems biology, synthetic biology as an important science for themselves, organize a seminar to begin with! I am waiting for all suggestions and suggestions in the comments or by email to nika.biph@gmail.com. And while we gather our thoughts, let's read a lot and think a lot! Here you will find a litminimum for bioinformatics, which is recommended by Mikhail Gelfand.

Portal "Eternal youth" http://vechnayamolodost.ru
22.10.2015
Found a typo? Select it and press ctrl + enter Print version