Abstract RNA molecules provide an exciting frontier for novel therapeutics. Accurate determination of RNA structure could accelerate development of therapeutics through an improved understanding of function. However, the extremely large conformation space has kept the RNA 3D structure space largely unresolved. Using recent advances in generative modeling, we propose DiffRNAFold, a latent space diffusion model for RNA tertiary structure design. Our preliminary results suggest that DiffRNAFold generated molecules are similar in 3D space to true RNA molecules, providing an important first step towards accurate structure and function prediction in vivo.
Clarify
ISMB
CLARIFY: Cell-cell interaction and gene regulatory network refinement from spatially resolved transcriptomics
Motivation: Gene regulatory networks (GRNs) in a cell provide the tight feedback needed to synchronize cell actions. However, genes in a cell also take input from, and provide signals to other neighboring cells. These cell-cell interactions (CCIs) and the GRNs deeply influence each other. Many computational methods have been developed for GRN inference in cells. More recently, methods were proposed to infer CCIs using single cell gene expression data with or without cell spatial location information. However, in reality, the two processes do not exist in isolation and are subject to spatial constraints. Despite this rationale, no methods currently exist to infer GRNs and CCIs using the same model. Results: We propose CLARIFY, a tool that takes GRNs as input, uses them and spatially resolved gene expression data to infer CCIs, while simultaneously outputs refined cell-specific GRNs. CLARIFY uses a novel multi-level graph autoencoder, which mimics cellular networks at a higher level and cell-specific GRNs at a deeper level. We applied CLARIFY to two real spatial transcriptomic datasets, one using seqFISH and the other using MERFISH, and also tested on simulated datasets from scMultiSim. We compared the quality of predicted GRNs and CCIs with state-of-the-art baseline methods that inferred either only GRNs or only CCIs. The results show that CLARIFY consistently outperforms the baseline in terms of commonly used evaluation metrics. Our results point to the importance of co-inference of CCIs and GRNs and to the use of layered graph neural networks as an inference tool for biological networks.
2022
ST-CCI
NSUR
Benchmarking and Refining Cell-Cell Interactions with Spatial Transcriptomics and Deep Learning
The (mal-)functioning of human tissues can be attributed to genes that are active (expressed) or repressed relative to expectations. New genomic technologies allow measurements not only at single cell (sc) resolution, but also retain information on the spatial location of the cell. These Spatial Transcriptomics (ST) technologies could revolutionize human health. For example, they offer an unprecedented look at the tumor microenvironment, revealing the infiltrating immune cells and their interactions with their cancerous counterparts. Many computational methods now analyze the complex, high dimensional ST data for inferring these cell-cell interactions (CCIs). However, the ST community lacks a centralized ground truth to holistically evaluate these tools. Here, (a) we systematically benchmark existing methods and (b) suggest a deep learning method for refining ST-CCI prediction. We evaluated 7 methods, including CellPhoneDB and DeepLinc, on 10 simulated datasets at 5 noise levels as well as 2 real datasets generated using SeqFISH and MerFISH. CellPhoneDB achieved an average precision/recall of 0.79/0.75 respectively, with the recall reducing to 0.65 for certain datasets. DeepLinc only achieved 0.68 prediction accuracy on SeqFISH data after being trained with labeled data. Additionally, the ROC curves were surprisingly linear, suggesting that increasing true positive rate comes only with increasing false positive rate. All other methods resulted in similar performance and the same pitfalls: failing to properly utilize either the spatial information or downstream gene regulatory interactions (GRN), thus increasing the false positive interactions. Our work provides, for the first time, a curated data resource for future tool comparisons and a systematic analysis of the shortcomings of existing methods. Lastly, to address these drawbacks, we deployed a preliminary version of a subgraph neural network, where we represent each cell by a subgraph of its underlying GRN, obtaining lower-dimensional representations of each cell with GRN information embedded. These embeddings incorporate gene expression, spatial information, and GRN activity thereby allowing us to refine ST-CCI inference. This has myriad applications like revealing interactions between co-located immune cells and tumor cells, addressing the central biological problem of why certain tumors are “immunologically hot” and respond better to immuno-oncotherapy.
DeepVifi
ACM-BCB
DeepViFi: Detecting Oncoviral Infections in Cancer Genomes Using Transformers
Utkrisht Rajkumar, Sara Javadzadeh, Mihir Bafna, Dongxia Wu, Rose Yu, Jingbo Shang, and Vineet Bafna
In Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics(ACM-BCB) Oct 2022
We consider the problem of identifying viral reads in human host genome data. We pose the problem as open-set classification as reads can originate from unknown sources such as bacterial and fungal genomes. Sequence-matching methods have low sensitivity in recognizing viral reads when the viral family is highly diverged. Hidden Markov models have higher sensitivity but require domain-specific training and are difficult to repurpose for identifying different viral families. Supervised learning methods can be trained with little domain-specific knowledge but have reduced sensitivity in open-set scenarios. We present DeepViFi, a transformer-based pipeline, to detect viral reads in short-read whole genome sequence data. At 90% precision, DeepViFi achieves 90% recall compared to 15% for other deep learning methods. DeepViFi provides a semi-supervised framework to learn representations of viral families without domain-specific knowledge, and rapidly and accurately identify target sequences in open-set settings.
2021
MetaDetect
Computer-implemented methods for quantitation of features of interest in whole slide imaging
Nam Nguyen, Lorena Mora-Blanco, Kristen Turner, Julie Weise, Jason Christiansen, and Mihir Bafna