Conditional Causal Representation Learning for Heterogeneous Single-cell RNA Data Integration and Prediction

Conditional Causal Representation Learning for Heterogeneous Single-cell RNA Data Integration and Prediction

Jiayi Dong, Jiahao Li, Fei Wang

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 7392-7400. https://doi.org/10.24963/ijcai.2025/822

Single-cell sequencing technology provides deep insights into gene activity at the individual cell level, facilitating the study of gene regulatory mechanisms. However, observed gene expression are often influenced by confounding factors such as batch effects, perturbations, and spatial position, which obscure the true gene regulatory network that governs the cell’s intrinsic state. To address these challenges, we propose scConCRL, a novel conditionally causal representation learning framework designed to extract the true gene regulatory relationships independent of confounding information. By considering both fine-grained molecular gene variables and coarse-grained latent domain variables, scConCRL not only uncovers the intrinsic biological signals but also models the complex relationships between these variables. This dual function enables the separation of genuine cellular states from domain information, providing valuable insights for downstream analyses and biological discovery. We demonstrate the effectiveness of our model on multi-domain datasets from different platforms and perturbation conditions, showing its ability to accurately disentangle confounding influences and discover novel gene relationships. Extensive comparisons across various scenarios illustrate the superior performance of scConCRL in several tasks compared to existing methods.
Keywords:
Multidisciplinary Topics and Applications: MTA: Bioinformatics
Machine Learning: ML: Applications
Multidisciplinary Topics and Applications: MTA: Other