About Me

I summarized some of the frequently asking questions from my friends who are not familiar with our field


  • Q: Am I a biologist or a computer scientist?

    A: I consider myself a computer scientist. My daily research routine involves processing data; engaging in method development, designing integrated pipelines for large-scale data analysis; and arguing theoretically and empirically about our results, as other people in this area do. Unlike biologists, I do not touch pipettes, western blot, or experiment model organisms. Maybe one aspect that distinguishes us is the type of data we deal with. We do need to have a decent knowledge of populational genetics, molecular biology, etc., and vital skills for data preprocessing since data in our fields is arguably much more noisy, limited, and less intuitive than other types of data (Image, Natural language)

  • Q: As a computer science major, why do you study biology?

    A: Many computer science researchers develop methods for specific applications. Just like researchers who focus on computer vision, NLP, and cyber-physical systems, we focus on answering biological questions. We need additional effort to understand the data and the relevant biological knowledge to analyze the data.

  • Q: Why choose computational biology as my research field?

   A: I found this is one of the most charming areas. The genetic signal we discovered can help better understand human beings and bring new opportunities for clinical care. We have seen many great works that help us better understand the origin of human beings, the risk of having certain diseases, precision medicine, etc. Yet our knowledge of our own body is still limited, with vast opportunities in this field.

  • Q: What methods do I use to solve the questions

   A: Genetic datasets are usually high dimensional with a limited sample size. Handling this kind of data requires more careful assumptions about the model and solid domain knowledge. Therefore, statistics, linear algebra, and data mining skills are essential for solving the problem.

  • Q: Do I use deep learning?

   A: Yes, but in a prudent manner. The straightforward implementation of deep learning algorithms usually doesn’t work well in genetic datasets. Performance aspect, this could be due to the unique structure of the genetic dataset, the limited amount of training dataset, Computation aspect, this could be due to infeasibility to apply to the extremely high dimensional dataset. Finally, current deep learning models are generally hard to interpret, where interpretability is perhaps the most crucial factor in genetics.