August 20, 2016

Science: Thoughts on "Big data" (-omics) approach in medical science

This past week I got two paper published; one for original research, another for review article. It is coincidence that the publication dates fell on the same week. But hey, I say it was a good week.

Links to the NCBI-pubmed abstract page: 
(Pubmed is a Governmental database for lifescience publications. We use it regularly.)

Research article: "Systemic chromosome instability in Shugoshin-1 mice resulted in compromised glutathione pathway, activation of Wnt signaling and defects in immune system in the lung"

http://www.ncbi.nlm.nih.gov/pubmed/27526110


Review article: "Emerging links among Chromosome Instability (CIN), cancer, and aging"

http://www.ncbi.nlm.nih.gov/pubmed/27533343



I use "Big data" (-omics) approach for my research. An example is the research article.


An approach that has been the gold standard for science is "hypothesis driven" approach. 

Based on existing data, we deduce a mechanism, build a hypothesis, test it with experiments, and prove (or disprove) the hypothesis. If it is correct, we write a paper and publish it. 

Imagine professional baseball. 30% batting average (hereafter "hitting rate") makes you an excellent hitter, and 40% puts you among the greatest.

We scientists do not openly talk about the "hitting rate" for our hypotheses; how much of the hypotheses we conjure are supported by experiments and correct. 

Like baseball players, I am assuming that there are "good" scientists with higher hitting rates, and "bad" scientists with poorer hitting rates.

Judging from an interview article by Dr. Shinya Yamanaka, who won the Nobel prize in 2012 for his contribution on induced pluripotent stem cells, what he considers "normal" is around 20% hitting rate. He would start suspecting some kind of fraud or mistake if the hitting rate goes too high (30-40% or more).

Right. We may not easily admit, but there are many "incorrect" hypotheses that were not supported by experiments.


Now,......

Technological advances in last two decades enable us to take another approach. Some call it "data-driven" approach.

With the technology, we obtain a "big" dataset. In my case, it is profiles for all expressed genes (called transcriptomics) in the sample tissues. The technology does not discriminate or focus on a specific gene of interest. The technology does the job for all 20000+ genes in an unbiased manner.

Afterwards, with bioinformatics software, we figure out what is "wrong", or different, in the samples.


Scientists have their own training background. Immunologists know more about immunology than others, and cell biologists know more about the event they study than others.

Technology and computer don't care about my training background. They just point out the differences in the samples. Many of them are unexpected.

From the dataset that was obtained in an unbiased manner, I can start building hypotheses that are already backed up by data. We build "correct" hypotheses, and test them. 

My own training background was molecular genetics and cell biology. But if necessary, I will team up with immunologist, cell biologist, or whoever that has necessary skills.


It sounds like cheating, doesn't it? It is not. 

In my opinion, Data-driven science is a valid approach that can compete against, or complement, conventional hypothesis-driven approach in a very productive manner. 

It will yield results and will help advance biomedical research.


We humans are heavily biased. Scientists are no exception. Scientists have training backgrounds, that can affect their thoughts and course of actions. We may start hitting all nails because we have a hammer.

Cells and organs don't care about my research background. What works, works for them.

I work in translational oncology now. When we are dealing with the body, be it of animal or human, what matters most is that we consider all factors and events, so that we can take informed and intelligent approaches to tackle diseases.

Conventional hypothesis-driven approach may have been intelligent, but when you are dealing with the body as a whole, you may not be informed enough. Data-driven approach can shed lights on the blind spots created by our biases.


I love the "big data" approach. I don't have to worry about my 'hitting rate". I'll just swing the bat at the place where I know the ball will be coming. 

The joy of getting "hits" (getting the hypothesis correct) never gets old. Just be mindful about the novelty and not fall into confirmatory science.