A Better Algorithm for Predicting How Cells Behave

DNA modeling

In a preprint published in arXiv, researchers from Altos Labs have described a machine learning algorithm that performs end-to-end prediction of how cells’ gene expression will respond to interventions.

The need for prediction

Simulating biological processes on a computer is an incredibly difficult task. While advanced algorithms such as Google’s AlphaFold have revolutionized protein folding, the complete biochemistry of a cell is orders of magnitude more complex.

One way of getting around this is to simply use live cells. Modern RNA sequencing techniques make it relatively simple to test the effects of genetic perturbations and small-molecule interventions. However, even with this technology, there is still an enormous possibility space, different cell types respond differently, and changing how a cell behaves often requires multiple perturbations at once [1].

Machine learning algorithms, therefore, are intended to predict what sorts of perturbations may be of value to the field in silico, after which these predictions can be tested in vitro before such research can continue on to animals and people. Interestingly, previous work has found that simpler algorithms are largely more useful in broad applications and that removing extra constraints improves these models’ ability to generalize [2].

A flow algorithm with an unusual design choice

To that end, these researchers created PRiMeFlow, an algorithm that works directly within the gene expression space rather than compressing information into lower-dimensional spaces, as previous algorithms had [3]. This flow algorithm uses learned probabilities to transform known information into previously unknown configurations.

The authors note that their architecture of choice, a U-net, is normally considered suboptimal for the task at hand; gene expression ordering is arbitrary, and a U-net is geared towards spatially oriented tasks that involve measuring the relationship between nearby data points. A multi-layer perceptron (MLP) would normally be considered the better option, but ablating their U-net flow data into an MLP only worsened their model’s predictions. They admit that they do not know why this is the case, and they suggest an investigation involving cross-attention mechanisms that might better aggregate information without spatial biases.

Top performance

In its best configuration, PRiMeFlow achieved state-of-the-art performance in three key benchmarks that are part of the PerturBench platform. Two of these benchmarks represent covariate transfer: the model’s ability to predict the impact of perturbations under different conditions, such as cell types that may not have been included in the training data. On the third, which measures combined predictions, it outperformed many other models in all but one metric.

Against a private test set of human embryonic stem cells, PRiMeFlow performed exceptionally well, and this performance was bolstered by further fine-tuning. The best fine-tuned PRiMeFlow model was found to be the closest to in vitro results among all the models on the leaderboard.

The researchers laid out a vision for the future, suggesting that this work could form a foundation of virtual cells, which could theoretically be used to model entire virtual organisms. However, a large variety of computational and algorithmic challenges need to be conquered before such a vision could be made into reality.

We would like to ask you a small favor. We are a non-profit foundation, and unlike some other organizations, we have no shareholders and no products to sell you. All our news and educational content is free for everyone to read, but it does mean that we rely on the help of people like you. Every contribution, no matter if it’s big or small, supports independent journalism and sustains our future.

Literature

[1] Watanabe, K., Panchy, N., Noguchi, S., Suzuki, H., & Hong, T. (2019). Combinatorial perturbation analysis reveals divergent regulations of mesenchymal genes during epithelial-to-mesenchymal transition. NPJ systems biology and applications, 5(1), 21.

ADVERTISEMENT

Novos-labs Ads 4

[2] Lotfollahi, M., Klimovskaia Susmelj, A., De Donno, C., Hetzel, L., Ji, Y., Ibarra, I. L., … & Theis, F. J. (2023). Predicting cellular responses to complex perturbations in high‐throughput screens. Molecular systems biology, 19(6), MSB202211517.

[3] Klein, D., Fleck, J. S., Bobrovskiy, D., Zimmermann, L., Becker, S., Palma, A., … & Theis, F. J. (2025). CellFlow enables generative single-cell phenotype modeling with flow matching. bioRxiv, 2025-04.

About the author