← Back to home

Cell2Trial

Can single-cell foundation models map cell state to patient survival? A rigorous benchmark that led with a negative result.

UMAP of single-cell states mapped to simulated vs observed progression-free survival curves; log-rank p = 0.459 — Single-cell states (left) are mapped to a survival prediction; simulated and observed progression-free survival curves are statistically indistinguishable (log-rank p = 0.459).

Problem

Predict patient progression-free survival from single-cell transcriptomic state, in the realistic setting where cells and patients are unmatched — there are no paired cell-to-outcome labels to train on. Large single-cell foundation models are an obvious thing to reach for, but the honest question is whether they actually transfer to this task, or whether a simpler model does just as well.

Approach

I benchmarked eight methods for matching single-cell profiles to bulk cell-line references — from a deliberately simple Spearman-rank baseline up to zero-shot and fine-tuned foundation models (Geneformer V2, scGPT, scConcept). For the deep-learning side I added LoRA adapters and contrastive projection heads (symmetric InfoNCE) for parameter-efficient fine-tuning, trained across multiple GPUs with distributed data parallel (DDP). I harmonised five large-scale datasets (50,000+ cells, 15 cancer types) against a ~1,700-line DepMap reference and evaluated on three progressively harder transfer tasks — cell-line identity, PDX→tissue, and patient-tumour→tissue — using Top-1/Top-5 accuracy and mean reciprocal rank (MRR).

Downstream, the per-cell sensitivity predictions feed a virtual-trial simulator: drug-response (GR) metrics are transferred from a cell-line screening panel to individual tumour cells, then a Monte-Carlo tumour-growth model aggregates them into population progression-free survival curves, compared against published PDX trial data (Gao et al., 2015).

Result

On the hardest single-cell→bulk transfer the simple Spearman-rank baseline beat every foundation model: 61.8% vs 17.2% Top-1 on cell-line identity matching, and 52.3% vs 42.9% on patient-tumour tissue matching. Only on PDX→tissue did a contrastive-trained projection head narrowly edge ahead (40.8% vs 34.9%). Rather than bury the negative result, I made it the headline — direct evidence that strong simple baselines must anchor any foundation-model evaluation, and that bigger pretrained models do not automatically transfer across the single-cell-to-bulk domain gap. When the working model is carried through to survival, simulated and observed curves are statistically indistinguishable (log-rank p = 0.459, above).

Methods & stack

PyTorch · HuggingFace
Multi-GPU / DDP
LoRA adapters
InfoNCE contrastive learning
Geneformer V2 · scGPT · scConcept
Scanpy / AnnData
DepMap / CCLE reference
GR drug-sensitivity metrics
Monte-Carlo tumour growth
Kaplan-Meier / log-rank
Top-1/5 · MRR retrieval