Blog posts

2024

Genetic LLM Evo: A Third-Party Perspective

less than 1 minute read

Published:

Learning biological patterns is hard, even for large language models. Learning meaningful biology at the genetic level? That’s even harder. Yet, a recently proposed LLM, Evo, asks the bold question: Is DNA all you need?

2022

Expectation-Maximization (EM) algorithm part I (Introduction)

4 minute read

Published:

Introduction

Maximum likelihood estimation (MLE) is a way of estimating the parameters of a statistic model given observation. It is conducted to find the parameters that maximize observations’ likelihood under certain model distribution assumptions. However, in many real life problems, we are dealing with problems with parameters that are not directly available to infer given the limited data we have, which are called hidden variables Z. Many problems in the areas of genomics involve dealing with hidden variables. Typical examples in genomics are (i) inferring microbial communities (Z: different communities) (ii) inferring the ancestries of a group of individuals (Z: different ancestries) (iii) inferring the cell type content from specific sequencing data (Z: different cell types). Problems involving hidden variables like these are typically hard to directly performing maximum likelihood estimation.