Cell2Trial
Can single-cell foundation models map cell state to patient survival? A rigorous benchmark that led with a negative result.
Problem
Predict patient progression-free survival from single-cell transcriptomic state, in the realistic setting where cells and patients are unmatched — there are no paired cell-to-outcome labels to train on. Large single-cell foundation models are an obvious thing to reach for, but the honest question is whether they actually transfer to this task, or whether a simpler model does just as well.
Approach
I benchmarked eight methods for matching single-cell profiles to bulk cell-line references — from a deliberately simple Spearman-rank baseline up to zero-shot and fine-tuned foundation models (Geneformer V2, scGPT, scConcept). For the deep-learning side I added LoRA adapters and contrastive projection heads (symmetric InfoNCE) for parameter-efficient fine-tuning, trained across multiple GPUs with distributed data parallel (DDP). I harmonised five large-scale datasets (50,000+ cells, 15 cancer types) against a ~1,700-line DepMap reference and evaluated on three progressively harder transfer tasks — cell-line identity, PDX→tissue, and patient-tumour→tissue — using Top-1/Top-5 accuracy and mean reciprocal rank (MRR).
Downstream, the per-cell sensitivity predictions feed a virtual-trial simulator: drug-response (GR) metrics are transferred from a cell-line screening panel to individual tumour cells, then a Monte-Carlo tumour-growth model aggregates them into population progression-free survival curves, compared against published PDX trial data (Gao et al., 2015).
Result
On the hardest single-cell→bulk transfer the simple Spearman-rank baseline beat every foundation model: 61.8% vs 17.2% Top-1 on cell-line identity matching, and 52.3% vs 42.9% on patient-tumour tissue matching. Only on PDX→tissue did a contrastive-trained projection head narrowly edge ahead (40.8% vs 34.9%). Rather than bury the negative result, I made it the headline — direct evidence that strong simple baselines must anchor any foundation-model evaluation, and that bigger pretrained models do not automatically transfer across the single-cell-to-bulk domain gap. When the working model is carried through to survival, simulated and observed curves are statistically indistinguishable (log-rank p = 0.459, above).